MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BURLAU  OF  STANDARDS  196J-A 


26^MAY,  1978 


This  report  has  been  reviewed  by  the  Office  of  Information,  RADC,  and  is 
releasable  to  the  National  Technical  Information  Service  (NTIS).  At  NTIS 
it  will  be  releasable  to  the  general  public,  including  foreign  nations. 


A 


RADC's  SPECTRUM  ESTIMATION  WORKSHOP 


Wednesday, 

0800 

0900-0915 

0915-0945 

0945-1015 

1015 

Session  I 
1045 

1105 

1125 

1200 

Session  II 
1330 

1350 


AGENDA 

MAY  24,  25,  AND  26  1978 


May  24 


Page 


Registration 

Welcome  - Colonel  Owen  R.  Lawter 

Chief,  Surveillance  Division 

RADC  Overview  - Dr.  John  Burgess 
Chief  Scientist 

Approaches  to  Spectral  Analysis  - New  and 

Used,  Dr.  Lester  Gerhardt  (Co-chairman),  RPI  1 

Coffee  Break 

Lt  Colonel  William  Cuneo,  DARPA 

Frequency  Resolution  of  High-Resolution 
Spectrum  Analysis  Techniques,  Larry  Marple, 

Signal  Science,  Inc.  19 

Autoregressive  Model  Spectral  Estimation, 

Some  Simulation  Study  Statistical  Performance 
Results,  Will  Gersch  and  R.Z.  Liu,  University 
of  Hawaii  37 

High  Resolution  Spectral  Estimation  via  Rational 
Models,  Mostafa  Kaveh,  University  of  Minnesota  51 

Adjourn  for  lunch 


Dr.  Henry  Radoski,  AFOSR 


Maximum  Likelihood  Spectral  Estimation  Using 
State-Variable  Techniques,  Robert  McAulay, 
MIT/Lincoln  Laboratory  63 

An  Optimum  Filter  for  Spectral  Estimation 
Based  on  a Penalty  Function  Approach, 

R.J.P.  deFigueirdo,  Rice  University  69 


i 


DISTRIBUTION  STATEMENTT 

Approved  for  public  release; 
Distribution  Unlimited 


D D C 

IEjGJIZDOJIE; 

JUN  6 1978 


1410 


1410 

A Solution  to  the  Problem  of  Spontaneous 

Line  Splitting  in  Maximum  Entropy  Power 

Spectrum  Analysis  of  Complex  Signals, 

Paul  F.  Fougere,  AF  Geophysics  Laboratory 

77 

1430 

Coffee  Break 

1500 

Adaptive  Extrapolation  and  Hidden  Periodicities, 
A.  Papoulis  and  C.  Chamzas,  Polytechnic  Institute 
of  New  York 

85 

1520 

Recursive  Spectral  Estimation  of  a - Stationary 
Processes,  Dr.  M.  Morf  and  D.  T.  Lee, 

Stanford  University 

97 

1540 

Improved  Spectral  Estimation  From  Incomplete 
Sampled  - Data  Observations,  James  A.  Cadzow, 
Virginia  Polytechnic  Institute  & State 

University 

109 

1615 

Adjourn  for  the  day 

Thursday,  May  25 

Session  III 

Dr.  Donald  Burlage  (DRDMITER)  U.S.A.  R&D 

Missile  Command 

0830 

A Review  of  Prony's  Method  Techniques  for 
Parameter  Estimation,  Michael  L.  Van  Blaricum, 
Mission  Research  Corporation 

125 

0850 

System  Identification  by  Use  of  Pencil-of- 
Functions,  V.K.  Jain,  University  of  South 

Florida;  D.D.  Weiner  and  J.  Nebat,  Syracuse 
University;  T.K.  Sarkar,  Rochester  Institute  of 
Technology 

141 

0910 

Two  Dimensional  Spectral  Estimation,  Anil  K. 

Jain,  Surendra  Ranganath,  State  University  of 

New  York  at  Buffalo 

151 

0930 

Lattice  Methods  in  Spectral  Estimation, 

John  Makhoul,  Bolt  Beranek  & Newman,  Inc. 

159 

1000 

Coffee  Break 

1030 

Non-Parametric  Spectrum  Estimates  Motivated  by 
the  Wishart  Distribution,  David  J.  Thomson, 

Bell  Telephone  Laboratories 

175 

il 


1050 


1110 

1200 

Session 

1330 

1350 

1410 

1430 

1500 

1520 

1540 

1615 

Friday, 

0830 

1030 

1100 

1230 


Air  Vehicle  Detection  Using  Advanced  Spectral 
Estimation  Techniques,  Philip  G.  Tomlinson, 

Guy  A.  Ackerson,  Decision-Science  Applications, 

Inc.  191 

Radar  Imaging  of  Discrete  Targets  with 
Maximum  Entropy  Techniques,  Stephen  B. 

Bowling,  MIT/Lincoln  Laboratories  207 

Lunch 

IV  Dr.  Gerard  Trunk,  Naval  Research  Laboratory 

Application  of  Maximum  Entropy  Frequency 
Analysis  to  Synthetic  Aperture  Radar, 

Philip  L.  Jackson,  Lawrence  S.  Joyce  and 

Gerald  B.  Feldkamp,  Environmental  Research 

Institute  of  Michigan  217 

Antenna  Patterns  Computed  with  Maximum 
Entropy  and  the  Burg  Technique,  William  R. 

King,  Consultant,  Alexandria  VA  227 

Maximum  Entropy  Cepstral  Analysis,  Tom 

Landers,  MIT/Lincoln  Laboratories  245 

Coffee  Break 

A New  Adaptive  Filter  for  Radar  Clutter 
Rejection,  D.E.  Bowyer,  P.K.  Rajasekaran, 

W.W.  Gebhart,  Teledyne  Brown  Engineering  259 

Doppler  Spectrum  Estimation  for  Continuously 
Distributed  Radar  Targets,  George  R.  Cooper, 

Clare  D.  McGillem,  Purdue  University  ' 273 

Instantaneous  Frequency  Estimation  from 
Sampled  Data,  William  R.  Carmichael,  Richard  G. 

Wiley,  Syracuse  Research  Corporation  287 

Adjourn  for  the  day 


May  26 


A Comparison  of  Solutions  to  the  Workshop 
Problems,  Dr.  Lester  Gerhardt  (Co-Chairman), 
RPI 


Coffee  Brdak 
Workshop  Panel  Activity 
Adjourn  the  Workshop 
iii 


White  SecUM 
luff  SectiM 


9 




::::: 

DISTHIBBTIOI/AIAIUIIIITT  C0BQ  | 

Diet. 

AVAIL.  M/m  SPECIAL  | 

R 

PREFACE 


V 

This  workshop  provided  a means  for  key  researchers  in  the  field  to 
describe  their  work  and  also  provided  a means  for  comparing  the  work  of 
various  researchers  using  a common  data  base  for  representative  problems 
of  importance  to  the  Air  Force.  This  report  is  a collection  of  papers 
that  were  submitted  for  presentation  at  RADC's  Spectrum  Estimation 
Workshop  held  24-26  May  1978  at  Griffiss  Air  Force  Base,  N.Y.  13441.  The 
papers  were  published  as  received  by  RADC  and  have  not  been  edited. 
Further,  publication  of  these  papers  does  not  represent  approval  or 
endorsement  by  the  Rome  Air  Development  Center  or  the  U.S.  Air  Force. 

The  researchers  were  also  presented  with  a set  of  sample  problems 
called  the  Spectral  Estimation  Experiment.  The  object  of  this  experiment 
was  to  establish  a basis  for  comparison  of  the  wide  variety  of  techniques 
available  as  a function  of  selected  applications  on  both  real  and  artifi- 
cial data  sets  representing  specialized  problem  classes  which  are  of 
interest  to  the  government.  The  common  data  base  offers  several 
additional  advantages. 

Three  different  problems  have  been  formulated  by  the  workshop 
committee.  They  fall  generally  into  the  areas  of  radar,  pattern 
recognition  and  system  identification. 

The  detailed  description  of  the  problem  and  the  solutions  as  de- 
termined by  the  many  different  algorithms  employed  will  be  published 
separately. 
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Abstract 

This  paper  served  as  the  introductory  address  for  the  Spectral  Esti- 
mation Workshop  sponsored  by  Rome  Air  Development  Center  in  May  1978. 
Following  a short  history  of  the  development  of  techniques  for  spectral 
estimation,  the  major  approaches  are  summarized.'  These  include  the  more 
traditional  transform  approach,  the  rediscovered  autoregressive  estimator 
and  related  maximum  entropy  spectral  analysis  (MESA)  methods,  and  Prony's 
method.  Also  covered  are  new  approaches  including  an  algorithm  for  band 
limited  extrapolation,  among  other  methods.  The  major  approaches  are  then 
simply  compared  with  respect  to  data  required,  resolution,  application,  and 
sensitivity  to  noise.  Finally,  the  papers  presented  at  the  Workshop  are 
briefly  reviewed  and  grouped  with  respect  to  the  classes  of  techniques  dis- 
cussed. 

Introduction  and  Background 

In  the  way  of  historical  background,  the  "raison  d'etre"  for  this  Work- 
shop probably  is  best  explained  by  the  recent  rebirth  of  the  autoregressive 
spectral  estimation  techniques.  The  potential  of  such  methods  to  more 
accurately  estimate  parameters  of  certain  types  of  spectra,  particularly- 
peaked  spectra  characteristic  of  radar  applications  was  initially  viewed  by 
many  as  offering  "super -resolution"  with  capability  beyond  the  diffraction 
limit  (in  any  case  better  than  the  classical  techniques  in  resolution  for 
certain  signal  to  noise  ratios).  As  a result  of  investigations  and  appli- 
cations of  these  methods,  a broader  and  more  generalized  view  towards 
spectral  estimation  evolved.  This  led  eventually  to  the  organization  of  the 
Workshop,  for  the  consideration  and  comparison  of  several  classes  of 
approaches  to  spectral  estimation  by  experts  in  the  field  advocating  the 
different  methods.  In  the  process,  a more  realistic  view  of  the  advantages 
and  liabilities  of  each  of  the  approaches  was  formulated  coupled  with  a 
better  understanding  of  the  interrelationship  that  exist  among  the  tech- 
niques and  an  appreciation  of  the  historical  significance  of  their  develop- 
ment and  use. 


*The  contributions  of  E.  Pflug,  ESE  student  at  R.P.I.,  to  the  literature 
survey  and  technical  discussions  are  gratefully  acknowledged. 
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In  the  way  of  mathematical  background,  spectra],  analysis  is  well 
accepted  as  a tool  to  aid  the  understanding  of  the  signals  encountered  in 
the  physical  world.  The  mathematical  foundations  are  due  to  the  French 
mathematician  Baron  Jean-Baptiste  Fourier,  who  established  the  explicit  re- 
lation between  the  time  domain  function  (signal)  and  its  unique  frequency 
domain  function  (the  spectrum),  hence  the  name  Fourier  Spectrum.  Despite 
the  establishment  of  this  relationship,  it  was  more  than  100  years  before 
major  applications  involved  the  concept  (l).  The  Fourier  transform  pair 
relating  a continuous  aperiodic  signal  with  its  transform  is  simply 

F(cjq)  = f(t)  e"^  dt 


f(t)  = 2^-  J F(m)  eja)t  da> 

- DO 

It  is  usually  the  magnitude  of  the  transform  that  is  termed  the  spectrum  of 
the  signal.  More  often,  the  signal  itself  is  random  or  stochastic  but 
stationary  in  nature,  and  can  be  best  described  by  its  correlation  function 
R('C').  The  Fourier  transform  of  the  correlation  function  is  the  power 
spectral  density  which  together  form  a Fourier  transform  pair. 

0(o>)  = J R('t)  e~^uyC"  d't 

- GO 

Rfar)  = J 0(a>)  e^“rdto 

- GO 

The  above  basic  concepts  apply  to  continuous  signals  and  imply  infinite 
limits;  that  is  an  infinite  amount  of  data  is  needed  to  obtain  the  spectrum. 


In  more  practical  situations,  the  signals  of  interest  are  sampled  in 
time  thus  forming  a discrete  time  signal.  Such  signals  are  transformed  to 
a discrete  line  spectrum  using  a discrete  fourier  transform  (DFT) . The  DFT 
is  expressed  as 


N-l 

F(k^)  = ^ 
n=0 


, 2imk 

f(nT)  e N 


where  k = 0,  1, ...(N-l)  and  f(nT)  are  uniformly  spaced  samples  of  the  wave- 
form at  intervals  T. 

As  above,  only  a finite  number  of  the  signal  samples  are  available. 
Thus  the  problem  is  to  produce  a spectrum  of  a finite  history  of  the  signal 
which  best  estimates  the  actual  spectrum.  It  is  to  this  end  that  the 
approaches  to  follow  are  addressed. 
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In  many  cases,  continuous  or  discrete,  the  signal  characteristics 
(spectral)  change  with  time,  that  is  the  signal  is  nonstationary.  Conse- 
quently, one  must  deal  with  time  varying  or  short  term  spectra,  a function 
F(o),  t) . This  may  be  displayed  in  a time  varying  frequency  format  in  two 
dimensions.  In  any  of  the  above  cases  there  is  a clear  need  to  update  the 
spectral  estimate,  either  because  it  uses  a finite  number  of  data  points, 
or  because  the  characteristics  are  time  varying  or  both.  This  need  for 
continued  recomputation  of  the  transform  led  to  the  discovery  of  so  called 
fast  transforms  such  as  the  FFT  (Fast  Fourier  Transform)  which  greatly  re- 
duce the  computations  needed  and  permit  higher  speed  processing.  An  FFT 
typically  reduces  computations  from  N ^ to  N log2N,  in  the  order  of  Zj0  of 
that  of  a DFT  for  N = 500. 

Some  of  the  most  influential  work  in  developing  an  estimate  of  the 
power  spectrum  using  a finite  number  of  samples  or  time  history  of  the 
signal  is  that  of  Blackman  and  Tukey  (2,  3) • This  approach  will  be  dis- 
cussed shortly  in  more  detail.  The  overall  approach,  which  modified  the 
original  data  by  weighting  it  by  an  appropriate  window  function  or  weighting 
function,  yields  a result  which  is  the  convolution  of  the  transform  of  the 
window  function  with  the  spectrum  of  the  portion  of  the  signal  selected. 
Because  it  tends  to  taper  the  original  signal  values,  it  is  sometimes  called 
the  transform  and  taper  (TT)  approach  (2).  This  approach  remained  one  of 
the  most  popular  for  about  a period  of  ten  years  (in  the  open  literature). 

In  the  late  sixties,  a variety  of  non-transform  and  taper  approaches 
came  to  light.  These  included  the  autoregressive  method  (designated  AR)  by 
Parzens  (4),  and  the  maximum  entropy  method  (MEM)  of  spectral  analysis 
(MESA)  developed  by  Berg  (5)  each  from  slightly  different  prospect ives  and 
fields . In  several  papers  these  techniques  are  referred  -to  in  a common  con- 
text attesting  to  their  similarity  (6,  7)-  About  the  same  time  Capon  desig- 
nated the  maximum  likelihood  technique  (8) . In  a paper  in  this  Workshop 
(by  McAulay)  it  is  shown  that  the  maximum  likelihood  spectral  estimator 
corresponds  to  solving  the  same  normal  equations  as  for  the  MEM  for  an  all 
pole  model  with  large  signal  to  noise  ratio,  whereas  for  low  signal  to  noise 
the  MEM  is  no  longer  optimum.  Regardless  of  their  similarity,  one  thing 
however  is  clear.  These  techniques,  offered  a different  approach  to 
spectral  estimation  than  the  more  classical  window  approach,  and  these  were 
destined  to  be  compared. 

Representative  of  such  investigations  of  these  new  techniques,  H.  Akaike 
developed  an  error  criterion  for  AR  techniques  in  1969  (9)  called  the  final 
prediction  error  (FPE) . He  soon  related  this  to  spectrum  estimation  (10) . 

In  1971,  R.  T.  Lacoss  formulated  a paper  comparing  the  MEM  and  MM  methods 
(11).  In  1972,  T.  J.  Ulrych's  applied  the  MEM  method  to  truncated  sinu- 
soids (12)  and  K.  N.  Berk's  studied  the  consistency  and  asymptotic  character- 
istics of  the  AR  method  (18).  In  1973,  Edward  and  Fitelson  ( 13) , Gersch 
and  Sharpe  (14)  and  Ulrych,  Smylie,  Jensen,  and  Clarke  (15)  ~rther  extended 
these  new  techniques.  The  list  goes  on,  with  Radoski,  Fc.^  e,  and 
Zawalick  (l6)  and  Jones  (17)  investigating  the  MEM  and  AR  methods  respect- 
ively. In  1976,  Kaveh  and  Cooper  showed  the  equivalency  of  the  AR  and  MEM 
methods  and  the  comparison  to  the  TT  methods  and  investigated  their  proper- 
ties, with  some  numerical  examples  (2).  Many  others  continued  with 


applications  and  further  investigations.  Some  are  given  in  references 

(19-26). 


An  interesting  aspect  to  note  is  the  diversity  of  fields  represented  by 
these  various  investigators,  many  of  whom  are  contributors  to  this  Workshop. 
They  range  from  the  estimation  theory  of  the  Communications  and  Control 
areas  of  Electrical  Engineering  to  the  Geophysical  Scientists,  from  aca- 
medicians  to  engineers,  etc.  There  is  no  question  that  this  subject  offers 
the  opportunity  to  bring  together  the  talents  of  many  to  try  to  merge  the 
knowledge  of  systems  identification,  adaptive  systems,  signal  processing 
using  linear  predictive  coding,  recursive  estimation,  Kalman  filtering, 
classical  transform  analysis,  statistical  analysis,  computer  technology,  to 
name  a few  to  help  compare  these  new  and  old  methods  and  evaluate  their 
place  in  spectral  estimation.  It  is  to  this  end  that  this  Workshop  is 
directed. 

The  paper  will  now  briefly  review  aspects  of  the  major  methods  to  be 
later  considered  by  the  investigators,  taking  care  not  to  steal  any  of 
their  thunder.  Some  of  the  strengths  and  weaknesses  will  be  emphasized. 

Last  will  be  a review  of  the  papers  to  be  presented  in  the  light  of  the 
areas  described. 


Traditional  Spectral  Analysis 

The  most  straightforward  approach  to  spectral  analysis  is  to  supply  the 
time  domain  signal  to  a bank  of  narrow  band  filters.  The  bank  of  filters 
approximates  the  Fourier  transform  in  the  limit.  The  technique  in  concept 
offers  almost  ideal  spectral  analysis  limited  only  by  practicality.  Similar 
approaches  also  use  adaptive  filters  which  track  major  concentrations  of 
energy  in  the  spectrum  when  it  is  time  varying  such  as  in  speech,  sonar, 
medical  signals,  etc.  (27).  Variations  of  this  approach  use  swept  filter 
analyzers  which  simplify  the  circuitry  requirements  but  at  a sacrifice  in 
performance.  They  require  longer  acquisition  times  for  high  resolution, 
better  linearity  with  a wide  dynamic  range,  and  greater  stability.  A 
further  extension  of  such  ideas  are  time  compression  analyzers.  Nonetheless 
the  filter  bank  approach  remains  one  of  the  most  popular  particularly  when 
implemented  digitally. 

The  resolution  of  such  a system  is  limited  to  the  number  of  filters 
(and  therefore  the  bandwidth  of  each  filter)  placed  across  the  frequency 
range  of  interest.  In  practice,  with  these  realized  digitally,  it  is 
usually  the  computation  time  which  limits  the  effective  resolution.  From 
another  prospective,  the  time  that  the  filter  must  be  exposed  to  the  signal 
is  inversely  proportional  to  the  filter  bandwidth  (the  Heisenberg  Uncertainty 
Principle).  As  a result,  the  finer  the  frequency  resolution  desired,  the 
more  time  (or  larger  the  number  of  samples)  needed.  Clearly,  we  are  dealing 
with  a finite  time  (or  number  of  samples)  which  must  be  increased  to  obtain 
finer  frequency  resolution.  This  is  a fundamental  difficulty  with  this 
classical  approach  even  if  implemented  digitally. 

The  second  inherent  difficulty  (or  degree  of  flexibility  depending  on 
your  outlook)  is  the  effect  of  the  window  function  on  the  spectrum  obtained. 
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The  use  of  a finite  amount  of  data  may  be  easily  interpreted  as  a multipli- 
cation of  the  original  signal  by  a window  function  (viewed  as  a pulse  in 
the  simplest  case) . The  result  is  a spectrum  which  is  the  convolution  of 
the  original  spectra  (which  was  desired)  by  the  transform  of  the  window 
function.  The  result  may  be  truly  very  different  than  desired  depending  on 
the  selection  of  the  window  function. 


In  a more  exact  way,  a stochastic  signal  must  be  treated  in  terms  of 
its  correlation  function  as  mentioned  before.  Let  x(t)  be  the  given 
observed  signal  of  finite  length  Tm  (3).  Then 


c('tr)  = 


T - I 'V 
n 1 


(Tn-  |r|)/2 

f x ( t -r/2).x(t  +r/2)  dt 

-<Tn-|r|)/2 

|r|<L 


where  c('Z')  is  the  approximate  autocorrelation  function  at  lag 

Let  d (•'£')  be  the  window  function;  an  even  function  of  •£" . Then  the 
modified  approximate  autocorrelation  function  is  given  by: 


cm(r)  = d('C')  . c('Zr) 

and  the  power  spectrum  estimate  is  given  by" 


Cm(f)  = D(f)  * C(f) 

where  * represents  the  mathematical  operation  of  convolution.  In  actuality, 
Cm(f)  is  found  by  Fourier  transforming  cm(-D,  (windowing  of  the  original 
signal  is  also  possible) . This  is  commonly  known  as  the  correlation 
approach,  and  is  heavily  related  to  the  periodogram  approach. 


This  is  a well  researched  field  in  digital  signal  processing.  A good 
summary  of  window  functions  and  their  effects  appears  in  the  IEEE  Proceed- 
ings, Jan.  1978  in  a paper  by  Harris  (22).  The  emphasis  he  puts  on  the 
proper  selection  of  the  index  of  performance  is  important  to  note.  These 
windows  must  be  designed  to  compromise  between  resolution  and  stability. 
This  essentially  breaks  down  to  making  the  transform  of  the  window  D(f) 
block-like  yet  with  low  sidelobes.  One  naturally  contradicts  the  other. 

It  is  a restatement  of  the  uncertainty  principle. 


Overall,  then  the  traditional  approach  is  still  limited  by  the  tradeoff 
between  time  measured  data  and  frequency  resolution  obtainable,  and  the 
proper  selection  of  the  window  function,  not  unrelated  questions.  The 
advantages  and  steps  forward  here  in  recent  years  have  been  mostly  in 
improvements  in  speed  of  computation  using  FFT's  and  array  processing, 
making  this  technique  continually  appealing.  It  still  remains  as  one  of 
the  most  popular  approaches  to  stationary  and  even  nonstationary  spectral 
analysis  to  date  despite  the  drawbacks  citetS* 


5 


Autoregressive  and  Maximum  Entropy 


* 


* 


The  traditional  approaches  (autocorrelation,  periodograra,  etc.)  are 
good  when  sufficient  data  is  available  (long  compared  to  the  reciprocal  of 
the  lowest  frequency  of  interest) . In  many  problems  such  as  radar  and 
seismic  signal  processing,  this  is  often  not  the  case.  The  more  recently 
developed  methods  of  AR  and  MEM  offer  the  possibility  of  better  performance 
on  shorter  records. 

These  new  techniques  are  not  only  strongly  related  to  each  other,  but  to 
other  fields  as  well.  These  include  work  done  in  the  past  on  system  identi- 
fication where  the  objective  is  to  model  an  unknown  system  using  a linear 
adaptive  model  with  adjustable  poles  and  zeroes  (28),  linear  predictive 
coding  (29),  among  others. 

The  AR  method  uses  a finite  autoregression  fit  to  the  time  series  data 
and  calculates  the  spectrum  from  the  autoregression  coefficients  (as  well  as 
the  error  variance) . A key  problem  is  to  determine  the  order  of  the  auto- 
regression to  be  used.  The  difficulty  is  not  unlike  that  confronting  the 
user  in  a pattern  recognition  problem  in  determining  the  optimum  number  of 
features  to  use  given  a finite  data  set  (such  as  in  the  work  of  Foley,  Webb, 
Gerhardt,  etc.).  Akaike's  Information  Criterion  is  the  most  popular  method 
used  to  determine  the  order  of  the  regression,  because  of  its  effective  use 
of- the  final  prediction  error  FPE.  In  essence,  the  AR  parameters  are  fit 
by  a least  squares  to  the  covariance  sequence  of  the  observed  data  (the 
sets  of  equations  to  be  solved  for  the  coefficients  are  exactly  those  used 
for  linear  predictive  encoding  (LPE)).  When  the  order  of  the  estimate  is 
the  same  as  the  order  of  the  model  that  generated  the  data,  the  parameters 
are  known  to  be  maximum  likelihood  estimates . The  question  as  to  the  order 
of  the  model  has  been  the  brunt  of  much  of  the  recent  work  by  Akaike  and 
others . 

Maximum  Entropy  is  a method  credited  primarily  to  Berg,  which  provides 
an  estimate  of  the  power  spectral  density  which  maximizes  the  entropy  of  a 
stationary  random  process  from  the  first  N lags  of  the  autocorrelation 
function.  It  has  been  shown  (30)  that  this  method  is  in  fact  equivalent  t; 
fitting  an  AR  data  model  to  the  available  time  series  data  and  closely  re- 
lated to  work  appearing  in  the  statistical  literature  previously.  With 
this  similarity,  it  is  only  necessary  to  describe  one  of  these  techniques 
in  detail.  The  following  parallels  the  explanation  of  AR  given  by 
Griffiths  (24)  which  appears  in  numerous  other  references  as  well . 

The  AR  method  of  spectral  analysis  may  be  described  in  terms  of  the 
whitening  model  shown  in  Fig.  1.  The  input  data  sequence  x(k)  is  filtered 
by  a whitening  operator  having  an  impulse  response  with  z transform 

H(z)  = 1 - . a.jZ  - a^z  ...  aLz 

where  z ^ has  been  used  to  denote  a unit  delay.  As  is  well  known 

S£(co)  = Sx(o>)  | H (co)  I 
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In  AR  analysis,  the  L coefficients  a^,  a^,  . ..,a£  are  chosen  so  as  to 

minimize  the  power  at  the  filter  output.  Zero  output  power  is  not  possible 
due  to  the  fact  that  the  leading  coefficient  is  unity.  Markel  and  Grey  (29) 
have  shown  that  minimizing  the  output  power  is  equivalent  to  providing  the 
flatest  possible  output  spectral  density  Se(u>).  The  appropriate  equations 
for  these  coefficients  are  called  the  normal  equations  or  correlation 
equations,  and  are  given  in  matrix  form  by 


rx(°)  rx(1)'-"  rx(L  ‘ 1) 

- - 

al 

r (1) 

xv  ' 

rx(1)  rx(°)  •••  rx(L  - 2) 

a2 

. 

r (2) 

xv  ' 

rx(L  - 1)  ...  rx(0) 

_SL_ 

r*  (L) 
xv  ' 

where  r (£)  is  the  autocorrelation  of  the  input  data  sequence  at  lag  H . If 
the  resulting  coefficients  provide  a truly  flat  output  spectral  -density, 
i.e., 

S£  (cjo)  = ' , for  all 


then,  combining  the  input  spectral  density  S^(to)  may  be  expressed  as 


Since 


and,  as  shown  by  Makhoul 


S (cn) 


2 


H(oi) 


L 


&=1 


L 

^ - I »£ 

A=l 


the  input  power  spectrum  can  be  determined  directly  from  the  a^  , which  are 
called  the  AR  coefficients. 


Of  course,  one  cannot  always  expect  a flat  output  spectrum  for  finite 
values  of  L,  which  is  the  order  of  the  AR  model  referred  to  before.  This 
occurs  only  if  the  second-order  statistics  of  x(k)  can  be  reproduced 
exactly  by  a sequence  which  is  generated  by  filtering  white  noise  with  a 
filter  containing  no  zeros  and  L or  fewer  poles  (30).  For  this  reason,  AR 
spectral  analysis  is  often  called  the  all -pole-model  method  of  spectral 
analyst  is . 

The  theory  outlined  above  assumes  that  the  data  autocorrelation 
function  rxf?)  is  known  exactly  for  lags  Z = 0 through  Z = L.  In  practice, 
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this  function  must  be  determined  from  direct  measurement  on  the  data.  Two 
common  methods  are  currently  being  used.  They  are  the  Yule -Walker  aud  Burg 
procedures.  Yule-Walker  (YW)  involves  the  following  three  steps. 

1)  Estimate  rx(  £ ) for  11=0,  1,  ...,L 

2)  Substitute  these  values  and  solve  for  the  AR 
coefficients  a^,...,aL. 

3)  Compute  a spectral  estimate  S^(cn)  and  the 
results  of  steps  1 and  2. 

In  Burg's  method,  the  a^  coefficients  and  autocorrelation  values  are 
estimated  simultaneously  from  the  data  values  using  a recursive  algorithm. 
Once  these  parameters  have  been  found,  the  spectral  estimate  Sa(co)  is  com- 
puted as  in  step  3 above.  Burg's  method  will,  in  general,  provide  a 
different  estimate  than  that  given  by  the  YW  procedure.  There  is  some 
evidence  suggesting  that  higher  resolution  may  be  achieved  for  the  case  of 
sinusoids  in  white  noise  using  Burg's  algorithm,  but  this  is  still  an  open 
question  for  the  case  of  random  data. 

The  critical  set  of  equations  are  the  normal  equations  (to  determine  the 
coefficients) which  frequent  the  literature.  If  the  data  is  only  available 
sequentially  for  example,  it  is  possible  to  iteratively  estimate  the  corre- 
lation coefficients  and  sequentially  update  them  and  in  turn  the  coeffic- 
ients. Similar  work  has  been  done  in  adaptive  systems  by  Widrow  et.  al. 

To  briefly  demonstrate  the  similarity,  the  MEM  states  that  the  least 
assumptions  should  be  made  about  the  unobserved  points.  This  may  be  re- 
stated by  saying  that  the  spectrum  estimated  should  be  maximally  random 
(maximum  entropy)  and  still  be  consistent  with  the  observed  data.  The 
equation  Entropy*^03  log  P(f)  df  (where  intentionally  different  notation  is 

used)  is  the  entropy  of  a Gaussian  stationary  process.  The  object  is  to 
find  a P(f)  that  maximizes  this  entropy  and  agrees  with  the  measured  values 
of  the  autocorrelation  function.  The  MEM  solution  for  this  is 


n=l 


where  P(N+l)  and  a(n+l)  are  obtained  from 
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This  matrix  equation  may  be  recognized  as  being  identical  to  that  of  the  AR 
method  as  well  as  that  resulting  from  designing  a N+l  point  predictor. 

Prony ' s Method 


This  is  the  oldest  of  the  approaches  considered  with  the  original 
method  being  published  by  Prony  in  1795  (26,  29),  the  same  year  as  Gauss  pre- 
sented the  theory  of  least  squares  estimation.  It  also  has  been  rediscover- 
ed in  recent  years  due  mainly  to  the  work  of  Van  Blaricum,  who  will  be  pre- 
senting a review  of  this  technique  at  this  Workshop.  A recent  minisymposium 
dealing  with  the  Prony  approach  (26)  nrovides  a good  summary  of  the  many 
'facets  of  the  method.  Being  well  suited  for  modal  analysis,  it  has  gained 
popularity  in  several  communities  as  a means  for  estimating  the  complex 
resonances  of  the  system.  These  include  the  formulation  of  a voiced  speech 
model  for  use  in  the  linear  prediction  of  speech  (29),  the  characterization 
of  numerical  electromagnetic  response  data  (26),  etc.  These  resonances  of 
a system  are  nicely  represented  in  spectral  data  and  may  be  considered  the 
natural  modes  of  the  system.  Whereas  Prony 's  method  is  directed  at  finding 
the  complex  poles  of  the  system  which  are  comprised  of  both  these  natural 
frequencies  and  the  damping  factors,  the  spectral  plot  usually  only  shows 
the  location  of  the  natural  frequencies  which  in  many  cases  will  be  of 
sufficient  use  to  the  investigator.  In  this  sense,  Prony  is  more  general. 

In  summary,  Prony  is  applied  to  estimate  various  types  of  spectral  data 
particularly  those  where  the  interest  is  in  finding  major  resonances. 

The  mathematical  development  of  Prony  has  been  presented  many  times 
and  again  appears  in  the  development  to  be  forthcoming  by  Van  Blaricum. 

For  completeness,  this  same  development  is  repeated  here. 

The  Prony  approach  Is  based  on  the  well  known  fact  that  a system  to  be 
modeled  can  be  represented  by 

N , 

s . t 

R(t)  = 2,  Ai e 1 

i=l 

where  R(t)  is  the  response,  the  si  are  the  complex  poles  and  the  Ai  are  the 
corresponding  residues.  The  Si  can  be  written  as  Si  = ai  + ja^.  The  Oi 
are  usually  thought  of  as  the  damping  constants,  and  the  o>i  are  the  natural 
frequencies  in  radians  per  second.  In  practice,  the  measured  data  usually 
appears  as  a set  of  discrete  data;  this  equation  may  be  rewritten  as 

s.nAt 

R(tn)=  Rn=  y Ai  e 1 , n = 0,  1,...,M-1 

i=l 

where  At  is  the  time  sample  period  and  M is  the  total  number  of  samples 
taken.  This  set  of  equations  is  M nonlinear  equations  in-2N  unknowns.  If 
M is  equal  to  or  greater  than  2N,  and  if  all  At  are  equal,  then  this  non- 
linear set  of  equations  can  be  solved  using  the  Prony  algorithm. 
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Prony  further  stipulated  that  the  Rn  must  satisfy  a difference  equation 
of  order  N which  may  be  written  as 

N 

2 °p  Rp+k  s °*  ’*  = 0,  1,...,  *-1, 

P=0 

where  V is  the  value  of  M-N.  The  roots  z±  of  the  algebraic  equation 

N 

^ Op  zP  = 0 
p=0 

define  the  natural  frequencies  through 

s . At 

v z±  = e 1 , i = 1,  2, . . . ,N  . 

If  the  classic  s to  z transformation  Ojj  is  defined  equal  to  1,  then  the 
remaining  Op's  may  be  obtained  by  solving  the  equation 

N-l 

s a r . = - r.t  , . 

/ . p p+k  N+k 

p=0 

If  2N  data  samples  are  used,  this  last  equation  can  be  solved  exactly  for 
the  a's,  since  the  matrix  equation  has  a unique  solution  (given  certain 
conditions  of  independence) . If  more  than  2N  samples  are  desired,  then  one 
can  use  a least-squares  fit  to  obtain  the  solution  since  a unique  solution 
is  otherwise  not  existing.  Once  the  Op  have  been  found  then  the  roots,  z^ 
are  found  and  the  poles  (major  modes  or  resonances)  are  then  obtained  as 

In  z. 

Si  = At~  * 

It  is  a simple  procedure  to  obtain  the  residues,  Ai,  by  solving  the  matrix 
equation  once  the  si  are  known. 

If  pursued  further,  particularly  in  the  case  of  noise  where  an  exact 
fit  is  not  possible,  the  mathematically  inclined  will  recognize  the  simil- 
arities of  this  method  with  that  of  least  squares,  and  the  fitting  of  data 
using  eigenvectors.  In  the  latter,  the  residual  error  is  equal  to  the  sum 
of  the  remaining  eigenvalues  corresponding  to  the  eigenvectors  not  used  in 
the  fitting  of  the  data,  a relation  which  is  not  uncommon  in  more  detailed 
analysis  of  the  Prony  method. 

In  any  case,  this  discussion  should  suffice  as  an  introduction  for  the 
method,  and  the  brunt  of  the  explanation  of  its  strengths  and  weaknesses 
will  be  left  to  the  paper  to  be  presented  in  the  Workshop. 


Some  Never  Techniques 


In  recent  years  there  have  been  some  innovative  methods  developed  for 
spectral  estimation.  One  of  the  most  significant  has  been  that  by  Papoulis 
(23)  herein  cited  as  the  Papoulis  algorithm.  This  method  is  applicable  to 
bandlimited  signals.  The  algorithm  is  an  iteration  involving  the  fast 
Fourier  transform.  In  the  reference,  the  convergence  properties,  the 
effects  of  noise,  aliasing,  etc.  Eire  described  in  detail.  The  method  may 
be  used  to  extrapolate  bandlimited  functions  as  well. 


Given  a bandlimited  function  f(t)  and  a segment  of  it  g0(t) 


GqM  = f g0(t)  e-^  dt 

-T 


th  . . 


where 


The  n iteration  step  proceeds  as  follows:  Form  the  function 

Fn(^  = Gn-l(cu)  f«r  (ai) 

i | a)  f < <r 
o jcn  | > cr 

by  truncating  G^  ^(co)  and  compute  its  inverse  transform 


P«r  M 


f (t)  = 

n ' 


r 


F(m)  dm 


Now  form 


g„(t)  = f (t)  + f(t)  - f (t) 


f(t)  - fn(t)J  yaT(t)  = 


g(t)  |t  | < T 

fn(t)  j t | > T 


by  replacing  the  segment  of  fn(t)  in  the  interval  (-T,  T)  by  the  known 
segment  g(t)  of  f(t).  Finally 


,00 


/ «„ 

-OO 


(t)  e'Ja5t  dt 


at  the  nth  step. 


Note  that  fn(t)  is  bandlimited  and  given  by 


* aJlz~ 


By  this  continued  resubstitution  and  multiple  use  of  the  FFT  in  trans- 
forming from  one  domain  to  the  other,  the  bandlimited  spectrum  is  estimated. 
In  the  time  domain,  since  bandlimiting  the  spectrum  is  equivalent  to 


11 


extending  the  signal  (again  the  uncertainty  principle),  the  extrapolation 
of  the  signal  beyond  the  original  duration  interval  provided  may  be  per- 
formed as  well.  The  procedure  has  been  shown  effective  for  bandlimited 
signals.  A closed  form  procedure  similar  in  approach,  has  been  developed 
by  J.  Cadzow,  where  the  numerical  implementation  does  not  require  the 
truncation  of  generally  infinite  time  signals,  and  therefore  avoids  the 
error  producing  truncations.  More  significantly,  in  the  same  work,  Cadzow 
develops  a closed  form  rule  for  generating  the  desired  extrapolation  in  one 
step.  W.  Steenaart  has  also  extended  this  class  of  extrapolation  methods 
of  Papoulis  to  a matrix  formulation  where  the  total  process  is  achieved  by 
one  matrix  operation  resulting  in  savings  in  .computation  and  yielding  more 
accurate  results  in  some  cases.  (These  last  two  works  are  yet  unpublished 
to  my  knowledge . ) 

Another  new  technique  introduced  by  Gray  is  that  of  G-Spectral  esti- 
mation. The  transformation  was  introduced  in  1971  and  the  application  to 
spectral  estimation  was  presented  about  1976-1977  (31).  It  is  especially 
valuable  for  processes  whose  autocorrelation  can  be  expressed  as  a linear 
combination  of  complex  exponentials.  Results  are  still  preliminary  but 
tend  to  show  a smaller  mean  squared  error  in  some  cases.  A more  extensive 
investigation  is  required  to  be  able  to  critically  compare  the  method. 

There  are  many  other  methods  as  well.  Woods  (19)  has  extended  his 
work  to  two  dimensional  processes  and  shows  that  his  Markov  model  relates 
to  the  two  dimensional  maximum  entropy  spectrum.  An  iterative  technique 
for  computing  an  approximation  to  this  spectrum  is  given  as  are  results  for 
real  and  simulated  data.  This  Markov  spectral  estimate  can  offer  higher 
resolution  than  other  spectral  estimates.  A.  K.  Jain  presents  his  work  on 
two  dimensional  spectral  estimation  oriented  to  MEM  in  the  Workshop,  and 
will  cover  this  area  in  more  depth. 

Much  additional  work  has  been  done.  For  example,  V.  K.  Jain  has  pro- 
vided an  analysis  using  a mathematical  entity  called  a pencil  of  functions 
to  be  described  later.  Work  by  David  Thomson  on  Spectral  Estimation  Tech- 
niques for  Characterization  and  Development  of  WT4  Waveguide  - I (BSTJ, 

Nov.  '77)  treats  the  case  of  spectral  estimation  for  short  and  long  records 
Again,  I will  leave  detailed  discussion  to  Thomson  in  his  paper  in  this 
Workshop  also.  Of  course,  not  all  work  can  be  covered  and  my  apologies  to 
those  omitted. 

Overall,  it  is  clear  that  there  has  been  a major  thrust  in  work  re- 
lating to  spectral  estimation,  using  the  revitalization  of  older  methods 
with  new  twists,  or  by  the  development  of  newer  methods.  It  is  hoped  that 
this  Workshop,  which  additionally  provides  for  the  mechanism  for  comparison 
of  these  techniques,  will  answer  some  of  the  remaining  questions. 

Some  Comparisons 

The  similarity  between  the  AR  and  MEM  approaches  has  been  made  several 
times  so  they  are  treated  as  essentially  one  technique  at  this  point.  Also 
the  MLM  and  its  relation  to  these  techniques  has  also  been  brought  out  and 
is  not  separately  discussed.  The  main  thrust  of  the  Workshop  was  motivated 
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by  the  AR  and  associated  techniques  as  compared  to  the  more  conventional  TT 
(and  related  methods)  so  this  is  the  substance  of  the  comparison. 

It  is  fairly  well  accepted  by  many  investigators  that  the  AR  methods 
exhibit  higher  resolution  than  the  TT  counterparts  (Griffiths,  Ulrych, 
Cooper).  To  be  more  exact,  this  statement  must  be  modified  to  include  the 
results  that  show  this  advantage  to  be  true  particularly  for  estimating 
a rational  spectrum  (one  represented  by  a good  all  pole  model).  Not  as 
great  an  advantage  has  been  achieved  with,  for  example,  a Gaussian  defined 
spectrum  although  based  on  results  for  sinusoids  (rational)  in  noise,  it  is 
hoped  that  similar  improvement  can  be  made  in  improved  resolution  for  non- 
rationally  defined  functions.  In  general,  the  AR  methods  will  yield  higher 
resolution  for  the  same  number  of  samples  as  the  TT  methods  talcing  the 
above  into  account. 

It  is  also  concluded  in  many  studies  that  the  resolution  is  very  much 
a function  of  the  signal  to  noise  ratio  of  the  process  being  estimated.  It 
has  been  shown  (Marple)  that  the  resolution  is  in  fact  variable  and  is  a 
function  of  the  signal  to  noise  ratio.  In  contrast,  the  TT  methods  are 
more  independent  of  noise.  This  major  problem  with  AR  and  its  sensitivity 
to  noise  has  been  to  some  extent  reduced  or  traded-off  by  recent  advances 
and  improvements,  but  remains  inherent  with  the  approach  and  I view  it  as 
unavoidable.  The  characteristic  of  the  approach  is  such  as  to  estimate  the 
coefficients  from  which  the  spectra  is  derived.  These  coefficients  in 
actuality  represent  parameters  in  the  denominator  polynomial  of  the  model 
used  and  naturally  will  be  sensitive  to  errors  (noise)  as  anything  that 
leads  to  a matrix  in  the  normal  equations  producing  something  short  of  a 
correlation  matrix  with  a weak  major  diagonal.  This  phenomena  is  well  known 
in  system  theory  when  similar  difficulties  are  encountered  in  the  estimation 
of  poles  for  system  identification,  in  LPE  in  statistics  and  pattern  recog- 
nition to  name  a few.  The  AR  methods  as  a result  of  the  above  perform  well 
in  estimating  spectra  with  narrow  peaks  and  in  a high  signal  to  noise  en- 
vironment (Kaveh).  However,  it  is  also  well  known  that  MEW  techniques 
exhibit  line  splitting  at  lower  S/N  and  frequency  shifting  at  moderate  S/N 
(Fougere) . 

Moreover,  AR  methods  are  particularly  well  suited  for  short  record 
lengths.  In  fact  it  has  been  stated  (Newman)  that  Berg  is  siro  ly  not 
practical  for  long  records  to  go  to  the  other  extreme. 

In  addition  to  the  classic  advantages  of  AR  methods  for  obtaining 
higher  resolution  on  shorter  records  of  data,  and  the  difficulties  with 
sensitivity  to  noise,  there  are  other  factors  to  consider  not  all  of  which 
can  be  presented  here.  For  example,  given  a situation  where  there  are  two 
neighboring  peaks  of  equal  width  in  the  spectrum  to  be  estimated,  the  AR 
method  will  tend  to  emphasize  the  stronger  of  the  two  signals.  A narrower 
variance  of  the  estimate  will  be  obtained  and  the  results  will  be  misleading 
in  that  they  may  indicate  premature  termination  of  the  estimation  procedure, 
or  reflect  on  the  signals  as  having  different  widths,  strengths,  etc.  The 
AR  methods  are  critically  dependent  on  the  number  of  terms  used  in  the 
model,  with  too  many  Or  too  few  yielding  erroneous  results.  Yet  more  work 
is  required  to  determine  a fast  and  sure  way  to  arrive  at  the  optimum  model 
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representation  and  order  (although  the  AIC  approach  has  gone  a long  way  in 
this  direction) . 

There  are  of  course  advantages  and  disadvantages  to  be  mentioned  for 
techniques  within  this  class  of  AR  methods.  For  example,  the  MEM  (or  re- 
gressive) technique  has  been  shown  to  be  better  than  the  MIM  (Baggeroer), 
but  develops  a noiser  estimate.  These  detailed  intra  class  comparisons  will 
not  be  elaborated  further  here  since  there  are  too  many  and  for  which  in 
most  part  comments  made  rely  on  selected  experimental  results . 

Since  it  is  more  familiar,  the  pros  and  cons  of  the  TT  methods  are  more 
well  known.  Suffice  it  to  say  that  these  approaches  are  long  data  oriented, 
and  exhibit  the  classic  trade-off  between  time  and  frequency  resolution. 

If  we  have  a smooth  TT  estimate  and  good  resolution,  the  classic  approach 
will  generally  have  a better  chance  (Cooper) . 

Computationally,  the  AR  and  TT  methods  are  comparable  and  have  been 
shown  to  be  so  on  many  experimental  results  (Cooper,  etc.). 

Overall  with  respect  to  these  approaches,  one  can  present  the  two  sides, 
the  advantages  and  disadvantages,  and  the  user  based  on  his  needs  and 
evaluation  of  his  problem,  pays  his  money  and  takes  his  choice. 

Comparisons  can  be  made  to  some  of  the  other  methods  as  well  which  tend 
to  bring  out  the  similarities  of  these  approaches  by  virtue  of  the  similar- 
ities of  difficulties  that  arise.  For  example,  as  it  relates  to  Prony's 
method,  questions  which  still  need  concrete  answers  include  those  involving 
the  success  of  Prony's  method  with  multiple  poles,  (addressed  with  some 
vigor  by  Van  Blaricum)  how  to  determine  the  order  of  the  system  apriori 
(this  is  exactly  the  same  problem  faced  in  AR  with  respect  to  the  model), 
what  axe  the  effects  of  noise  on  Prony,  axe  just  a few  (Van  Blaricum  - in  a 
paper  given  in  this  Workshop).  These  are  obviously  not  easy  questions  since 
they  have  been  asked  now  for  some  183  years  when  Prony's  method  was  first 
published. 

However,  now  comparisons  are  being  made  using  results  to  be  forthcoming 
in  the  symposium  Workshop.  Since  I promised  not  to  steal  the  thunder  of 
the  presenters  to  come  (with  many  of  them  being  able  to  exhibit  more  thunder 
than  I can  muster  at  this  time),  I will  not  say  any  more  but  simply  address 
the  coverage  of  each  of  the  papers  to  be  offered  with  an  attempt  to  indicate 
where  comparisons  will  be  made  along  the  way. 

The  Papers  to  be  Presented 


As  of  this  writing,  some  23  papers  were  scheduled  to  be  presented  in 
the  Workshop.  These  are  divided  into  four  sessions,  with  some  sessions 
dealing  with  more  than  one  subject  area.  The  organization  of  these  papers 
as  well  as  some  selected  comments  follow. 

The  first  three  papers  by  Marple,  Gersch,  and  Kaveh  all  deal  with  the 
autoregressive  approach.  Marple  does  a comparison  of  the  conventional, 
autoregressive,  and  Pisarenko  decomposition  methods  for  the  two  sinusoid 


case  with  results  which  tend  to  support  those  made  in  the  comparisons  of 
this  paper.  Gersch  reviews  the  method  and  presents  some  detailed  results 
and  comparisons  as  well.  Kaveh  presents  a rational  spectral  model  and  an 
efficient  method  for  computing  it  and  compares  his  result  with  the  auto- 
regressive spectral  estimator. 

The  next  three  papers  by  McAulay,  R.  deFigueiredo  and  Fougere  deal  with 
the  maximum  entropy  or  maximum  likelihood  approaches.  McAulay,  using  state 
variables,  compares  the  maximum  likelihood  estimator  with  the  MEM,  and  re- 
lates the  work  to  the  Kalman  filter  in  an  excellent  development.  Fourgere 
describes  one  of  the  problems  of  the  MEM,  that  of  spontaneous  line  splitting 
in  the  presence  of  additive  noise  at  low  noise  levels,  and  frequency  shifts 
at  moderate  noise  levels  and  offers  some  solutions . 

The  three  papers  that  follow  deal  in  one  sense  or  another  with  extra- 
polative or  recursive  techniques,  and  include  the  works  of  Papoulis, 

Kailath,  and  Cadzow.  Papoulis  presents  his  work  on  adaptive  extrapolation 
and  hidden  periodicities.  Kailath  discusses  recursive  spectral  estimation; 
and  Cadzow  describes  his  work  on  incomplete  data  observations  and  extra- 
polation as  well  with  applications  to  radar. 

The  next  five  papers  broaden  the  work  of  the  three  major  methods  cited 
above  with  a wide  variety  of  techniques.  Starting  out  is  Van  Blaricum  who 
presents  a comprehensive  review  of  Prony's  method,  the  effects  on  noise  on 
the  method,  and  some  examples.  The  use  of  pencil-of -functions  approach  is 
covered  next  by  V.  K.  Jain  et.  al.  with  applications  for  system  modeling 
and  identification,  another  prospective  to  viewing  the  problem  of  spectral 
estimation.  A.  K.  Jain  then  extends  the  work  to  two  dimensional  spectral 
estimation  using  maximum  entropy  with  an  iterative  algor i-thmic  approach. 
Lattice  methods  in  spectral  estimation  is  the  topic  discussed  by  Makhoul, 
and  Thomson  describes  his  work  on  non-parametric  spectral  estimates  using 
Wishart  distributions. 

The  remaining  nine  papers  deal  exclusively  with  applications  of 
spectral  estimation  techniques.  The  first,  by  Tomlison  et.  al.  is  concerned 
with  radar  surveillance  comparing  MEM  and  LPC  among  others.  Bowling's 
paper  also  deals  with  radar  involving  the  MEM  approach,  for  radar  imaging. 
MEM  and  its  application  to  Synthetic  Aperture  Radar  (SAR)  is  handled  by 
Jackson  et.  al. , and  gives  some  of  the  advantages  and  disadvantages  of  the 
technique  for  this  problem.  King  and  then  Landers  both  consider  MEM,  the 
former  applying  it  to  wavenumber  power  spectra,  and  the  latter  to  cepstral 
analysis.  Two  more  radar  oriented  papers  follow  by  Bowyer  et.  al.,  and 
Cooper  and  McGillem,  dealing  with  radar  clutter  rejection  and  doppler 
spectral  estimation  respectively.  The  paper  by  Tsokos  treats  the  case  of 
spectral  analysis  of  ionospheric  data.  Last  but  not  least  is  the  paper  by 
Carmichael  and  Wiley  who  use  classical  zero  crossing  analysis  to  determine 
the  results  of  one  of  the  Workshop  problems,  and  do  very  well  at  that. 

This  last  paper  is  an  excellent  lead  into  the  comparison  of  the  methods 
on  a common  data  set,  the  subject  of  the  last  day  of  the  Workshop. 
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Part  II  of  the  Proceedings  will  be  published  in  the  form  of  a Technical 
Report  and  will  compare  the  results  of  the  different  approaches  as  applied 
by  the  investigators  to  the  three  sample  problems  distributed  as  part  of 
the  Workshop.  Together,  these  two  reports  should  provide  the  reader  with  a 
comprehensive  view  of  the  field  and  means  for  evaluation,  comparison  and 
establishing  directions  for  future  research. 
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Abstract 


The  two-sinusoid  frequency  resolutions  of  the  conventional 
Fourier,  autoregressive,  and  Pisarenko  decompositon  power 
spectral  density  estimates  are  presented.  The  conventional 
Fourier  spectrum  analysis  method  has  a resolution  which  is, 
on  the  average,  approximately  the  reciprocal  of  the  observa- 
tion (sample)  interval.  However,  for  any  case  of  two  sinusoids 
of  some  arbitrarily  set  initial  phases,  the  resolution  may 
be  much  greater  or  much  less  than  the  average  resolution. 

The  Pisarenko  decomposition  method  can,  in  theory,  perfect- 
ly resolve  two  or  more  sinusoids  if  the  number  of  sinusoids 
and  the  autocorrelation  function  is  perfectly  known.  The  auto- 
regressive (maximum  entropy)  spectral  estimation  procedure, 
on  the  other  hand,  has  a resolution  that  varies  as  a function 
of  the  signal-to-noise  ratio  (SNR) . Its  theoretical  resolution 
ranges  between  that  of  the  Pisarenko  decomposition  to  that  of 
the  conventional  Fourier  methods.  From  experiments,  it  has 
been  found  that  at  a 20dB  SNR,  the  autoregressive  (AR)  resolution 
performance  was  about  four  times  that  of  the  conventional  Fourier. 
At  OdB  SNR,  the  factor  was  about  twice  as  good  and  at  -lOdB  SNR, 
there  was  almost  no  difference  in  the  resolution. 

Introduction 


A rule  of  thumb  often  stated  when  computing  a power  spectral 
estimate  is  that  the  frequency  resolution  is  the  reciprocal 
of  the  time  interval  from  which  data  for  the  spectrum  analysis 
has  been  collected.  This  rule  is  based  on  the  uncertainty 
relationship  between  the  time  duration  of  a signal,  AT,  and 
the  "duration",  or  extent,  of  the  signal  transform,  Af  [1], 
Although  the  uncertainty  relationship  is  normally  concerned 
with  one  signal,  it's  interpretation  as  a resolution  measure 
for  two  sinusoids  assumes  that  the  signals  can  be  as  close  as 
Af  Hz  apart  before  there  is  significant  overlap  of  the  trans- 
forms that  will  not  permit  the  two  separate  responses  of 
the  two  sinusoids  to  be  distinguishable.  Thus, 
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Af  l/AT 


(1) 


Conventional  analysis  implicitly  assumes  that  the  signal  outside 
the  window  of  observation  is  zero.  Any  signal  is  arbitrarily 
truncated  to  have  a duration  of 4T seconds , the  window  width. 

For  conventional  methods,  then,  AT=  &TD  . For  the  high  resolu- 
tion analysis  methods,  it  will  be  shown  that  they  have  a non- 
zero extension  to  the  observed  data.  The  measured  plus  extend- 
ed data  then  has  an  effective  time  duration  A7">^\7^.  As  such, 
the  frequency  resolution  of  the  high  resolution  methods  will 
have  better  resolution  than  the  conventional  spectral  methods. 
This  is  the  basis  for  the  high  resolution  claims  of  the  auto- 
regressive (AR)  methods.  This  paper  quantifies  both  the  theo- 
retical and  achievable  resolutions  with  plots  of  the  resolution 
performance  for  two  conventional  Fourier  methods,  the  AR 
method  using  the  Burg  algorithm,  and  the  Pisarenko  descomposi- 
tion  procedure. 

An  illustration  of  the  range  of  achieved  resolutions 
using  the  same  data  for  one  conventional  and  two  high-resolution 
techniques  is  shown  in  Figure  1.  The  AR-with-noise-power-cance- 
llation  (NPC)  method  is  an  approximation  to  the  Pisarenko 
decomposition  procedure.  The  details  of  the  AR  with  NPC  is 
discussed  in  reference  [2], 

Measure  Of  and  Condition  For  Resolution 

In  this  paper,  the  condition  for  two-sinusoid  resolution 
is  defined  as  the  frequency  Af  = |f.  - f 2 | at  which  the  power 
spectral  density  (PSD)  evaluated  at  the  center  frequency 
S(f  ),  where  f = (f.  + f^)/2,  is  equal  to  the  average  of 
thecPSDs  evaluated  at  theztwo  sinusoid  frequencies,  i.e.. 


s(u>ftyo  = *[s(o  + s(o]  . (2) 


This  definition  of  resolution,  shown  in  Figure  2,  was  motivated 
by  the  desire  to  provide  a common  method  that  could  be  applied 
to  any  method  of  PSD  estimation  since  the  Pisarenko  decomposi- 
tion and  AR  metholds  do  not  have  the  traditional  mainlobe 
functions  from  which  a 3 dB  response  width  can  be  measured. 
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The  measure  of  frequency  resolution  is  given  in  terms 
of  a dimensionless  quantity  R called  the  normalized  resolution 


R =•  2-rrMAtAf 


(3) 


where  At  is  the  samplinq  interval  in  seconds,  Af  is  the  fre- 
quency separation  in  Hz  at  the  point  of  just  being  resolved, 
and  M is  the  number  of  autocorrelation  lags.  The  motivation 
for  the  definition  of  the  normalized  resolution  was  taken, -from 
the  time-bandwidth  product  relationship  of  the  uncertainty 
principle.  As  such,  for  conventional  spectrum  analysis  one 
would  expect  R , or  about  6.28.  To  determine  resolution 
in  Hz,  one  simply  looks  up  the  proper  R on  a normalized  resolu- 
tion plot  for  given  M and  SNR.  With  R,  M,  and  At  known, 
then 


Af  = R/2*rMAt  Hx  . <4> 


Note  that  M At  is  the  total  observation  interval. 

Conventional  Fourier  Methods  Resolution 

Based  on  the  above  definition  of  R,  an  analytically  de- 
rived normalized  resolution  for  the  conventional.  Fourier  method 
using  known  autocorrelation  lags  is  shown  on  Figure  3.  The 
details  of  the  lengthly  analysis  is  provided  in  reference  [3]. 
The  conventional  Fourier  method  is  the  Blackman-Tukey  PSD, 


a M-l 

= Z ekpC-j2'TTfmAt)  (5) 


where  the  are  the  known  or  estimated  autocorrelation  lags. 

Note  that  the  Summation  in  equation  (5)  is  finite,  indicating 
the  zero  extension  implied  by  the  conventional  method. 

Often  a weighting  is  used  with  the  lags  to  reduce  the 
effects  of  sidelobe  leakage.  A common  weighting  is  the  tri- 
angular, or  Bartlett,  window.  When  this  window  is  used,  the 
resolution  degrades  slightly  to  that  shown  in  Figure  3.  This 
illustrates  how  windowing  will  reduce  sidelobes  at  the  expense 
of  resolution.  Note  that  both  the  curves  for  the  Blackman- 


Tukey  procedure  fall  below  the  6.28  line  given  by  the  uncertain- 
ty principle.  This  indicates  it  is  possible  to  achieve  a re- 
solution somewhat  better  than  the  standard  rule  of  thumb. 


The  more  common  conventional  Fourier  PSD  estimate  is 
the  periodogram,  where  the  estimate  is  computed  directly 
from  the  time  samples  X , 


1 

M At 


2 e.xp(-j  2"K  frr  At) 

r"  «0 


(6) 


This  is  most  often  computed  with  the  FFT , which  evaluates 
equation  (6)  at  discrete  intervals  of  the  frequency  parameter 
f.  Figure  4 summarizes  the  normalized  resolution  for  this 
method . 

Figure  4 illustrates  that  in  addition  to  the  observation 
duration,  the  relative  phasing  between  the  two  sinusoids 
is  a large  factor  which  determines  the  periodogram  resolution. 
Since  the  periodogram  is  based  on  a windowed  transform  of  the 
data,  windowed  sinusoids  produce  sin  f/f  functions  in  the  trans- 
form domain.  Depending  on  the  initial  phases  of  the  sinusoids, 
the  net  transform  of  two  or  more  windowed  sinusoids  is  the 
result  of  the  complex  vector  constructive  and  destructive 
interference  of  the  sidelobes  of  the  sin  f/f  functions.  This 
sidelobe  interaction  as  a function  of  initial  phases  of  the 
two  sinusoids  has  a great  affect  on  the  resolution  achievable 
by  the  periodogram,  as  shown  in  Figure  4.  Figure  4 was  derived 
by  running  a computer  program  that  determined  the  range  of 
resolutions  for  successive  5°  steps  in  the  initial  phases  of 
the  two  sinusoids.  No  noise  was  added  since  the  periodogram 
resolution  is  not  a function  of  the  SNR  (however,  the  variance 
of  the  resolution  over  an  ensemble  of  data  sets  with  noise  is 
a function  of  SNR).  Therefore,  the  mean  resolution  points  on 
Figure  4 are  the  average  resolution  performance  for  the  period- 
ogram over  all  possible  phases.  The  average  of  the  means  gives 
a rough  rule-of-thumb  for  the  periodogram  resolution  of  AF * 
0.86/ AT  Hz,  where  At  = MAt  is  the  observation  interval. 

This  contrasts  with  the  usual  1/sr  rule  of  thumb  normally  used. 
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The  scalloped  mean  resolution  curve  has  been  reproduced 
on  Figure  3.  It  overlaps  the  curve  of  the  Blackman-Tukey 
PSD  using  known  autocorrelations  with  Bartlett  weighting. 
This  is  not  surprising  since  the  periodogram  and  Blackman- 
Tukey  procedures  yield  identical  results  if  the  autocorrela- 
tion lag  estimates 


M-l-n 


m-0 


(7) 


for  n=0 , M-l  and  letting  «...  - «„  , are  used  in 

equation  (5).  It  is  easily  shown  that  equation  (7)  constitutes 
a biased  estimate  of  the  lags,  the  bias  being  that  of  a Bartlett 
weighting. 


Pisarenko  Decomposition  Resolution 

When  a signal  is  known  to  consist  of  pure  sinusoids 
in  white  noise,  an  appropriate  procedure  to  find  the  unknown 
frequencies  and  powers  of  the  sinusoids  in  the  signal  is  the 
Pisarenko  spectral  decomposition  procedure.  In  order  to  perfect- 
ly resolve  the  spectral  components,  i.e.,  decompose  the  spectrum 
perfect  knowledge  of  M+l  lags  of  the  autocorrelation  function 
for  M/2  sinusoids  in  white  noise  is  required.  Given  these. 

Frost  [4]  has  formulated  a procedure  based  on  Pisarenko's 
abstruse  paper  [5]  to  find  the  sinusoid  frequency,  summarized 
here . 


(A)  Determine  the  smallest  eigenvalue  y of 
the  autocorrelation  matrix  of  rank  M ^P1?, 
where 


(8) 


The  minimum  eigenvalue  is  guaranteed  to  be 
the  white  noise  power  spectral  density, 
y min  = CT  N 2 (see  Marple  [31  for  proof). 
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(B) 


Solve  for  the  eigenvector  Y = coltl,  Y^,...,Y] 

corresponding  to  \ M 

^ ^ min, 


§ 


)f  = c r 

u K 


(9) 


The  vector  ¥ may  be  related  to  the  coefficients 
of  an  ARMA  model  of  the  sinusoids  in  white 
noise  process  [3]. 

(C)  The  frequencies  of  the  sinusoids  are  found 
by  evaluating  the  roots  of  the  polynomial 


Y TZ  = \ + Y,Z  + • • - = o 


(10) 


2 M 

where  z = col[l,  z,  z ,...,z  ].  The  roots  zL 
are  assured  of  having  unit  modulus,  so  that 
z.  = exp  (jouj  , and  are  the  radian  frequencies 
oi  the  sinusoids. 

A companion  equation  to  the  above  procedure  to  obtain  the  power 
of  each  spectral  component  has  been  given  by  Marple  [3,  4]. 

The  perfect  resolution  of  the  Pisarenko  decomposition  (PD) 
procedure  is  depicted  as  the  line  of  R=0  in  Figure  3.  However, 
in  practice  one  usually  estimates  the  autocorrelation  lags 
from  data.  Also,  the  number  of  sinusoid  components  is  not 
typically  known  a priori.  These  two  factors  introduce  error 
in  the  procedure  of  equations  (8)- (10)  and  prevent  the  achieve- 
ment of  perfect  resolution.  A procedure  known  as  AR  with  noise 
power  cancellation  is  able  to  approach  the  potential  performance 
of  the  PD  without  the  need  to  solve  a eigenproblem  and  a poly- 
nomial root  problem.  This  method  is  discussed  in  another 
paper  [2]. 
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AUTOREGRESSIVE  METHOD  RESOLUTION 


Many  claims  have  been  made  about  the  superior  resolution 
of  the  autoregressive  (AR)  method,  alias  maximum  entropy  method 
(MEM) . In  the  original  MEM  given  by  Burg  [6] , known  auto- 
correlation lags  were  used  to  compute  the  PSD  estimate  given 
by 


,At 


M 


1+  H am  exp  C-j  2-nfm  At) 


flh  -I 


(ID 


The  scalar  and  the  AR  coefficients  Ca  } are  found  by 
solving  the  matrix  equation  m 

4>  A = P 


(12) 


where  $ is  given  by  equation  (8),  A=  [1,  a4,...,a  ]’ 

P=[p,o, o]r.  ~ m 

It  can  be  shown  [3]  that  the  AR  PSD  may  be  equivalently 
written  as 


M+l 


SAR(f)  = 2 e*p(-j2wf„At) 


(13) 


where 


for  InUM. 

~ — °-rr'  rr  for  |r>\  > M.  . 


(14) 


Thus,  the  AR  PSD  estimate  has  no  implied  window  like  the  con- 
ventional Fourier  methods  (see  equation  5)  since  the  AR  PSD 
is  equivalent  to  a conventional  method  with  infinite  summation 
limits.  No  window  means  no  sidelobe  leakage  phenomena  with 
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the  AR  PSD.  Also,  the  lag  extension  given  by  equation  (14) 
gives  the  effective  time  durationAT >AT  , meaning  a higher 
resolution  than  the  conventional  PSD.  ° 

An  analytical  treatment  of  equation  (11)  using  the  auto- 
correlation function  for  two  equal  amplitude  sinusoids  in 
white  noise  provided  the  mean  resolutions  shown  in  Figure  3 
as  a function  of  SNR.  For  very  high  SNR,  the  performance 
approaches  that  of  the  PD  procedure.  For  very  low  SNR, 
the  performance  asymptotically  approaches  that  of  the  con- 
ventional Blackman-Tukey  procedure. 

Figure  5 replots  the  normalized  resolution  curves  to 
a different  scale  format.  An  empirical  equation  which  fits 
these  composite  curves  is 

R = 6.471  [SNR(M  + 1)]'3'  # (15) 

Note  that  the  resolution  is  proportional  to  a power  of  the 
product  of  the  SNR  (in  linear  rather  than  dB  units)  and 
the  number  of  lags. 

When  only  data  samples  rather  than  the  autocorrelation 
lags  are  available,  the  resolution  performance  is  shown  in 
Figure  6.  The  AR  PSD  is  determined  using  the  Burg  algorithm 
[6]  to  determine  the  AR  coefficients  by  a least  squares  esti- 
mation procedure.  Many  other  algorithms  are  available  to 
determine  the  AR  coefficients.  Makhoul  [7]  has  formulated 
a lattice  method  algorithm.  Morf  et.  al.  [8]  have  formulated 
a ladder  method  algorithm.  Time  did  not  permit  testing  these 
techniques . 

Two  sinusoids  were  used,  each  of  zero  degrees  initial 
phase,  at  three  SNR  levels.  The  zero  phase  condition  was 
found  to  be  about  the  worst  case  resolution.  The  periodogram 
resolution  is  shown  for  comparison.  A large  ensemble  of  data 
sets  was  run  in  order  to  determine  the  mean  resolution  and 
variance,  as  illustrated  in  Figure  6.  From  the  figure,  at 
20  dB  SNR  the  resolution  of  the  AR  method  using  the  Burg 
algorithm  is  about  four  times  better  than  that  at  the  periodo- 
gram. At  0 dB  SNR,  the  improvement  factor  is  about  two. 

As  the  initial  phases  of  the  sinusoids  change,  the  resolu- 
tion plots  of  the  AR  and  periodogram  PSD  techniques  tend  to 
track  each  other. 
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In  the  intermediate  case,  in  which  fewer  AR  coefficients 
than  data  points  are  computed,  the  performance  is  shown  in 
Figure  7.  As  expected,  an  increase  in  the  number  of  data 
points  for  fixed  number  of  AR  coefficients  (4  in  this  case) 
brings  the  normalized  resolution  closer  to  the  theoretical 
performance  given  in  Figure  3. 

The  improvement  in  AR  resolution  illustrated  thus  far 
is  perhaps  optimistic.  In  practical  situations,  often  no 
more  than  half  as  many  AR  coefficients  as  there  are  data 
points  are  calculated.  This  is  done  to  minimize  the  extent 
of  spurious  responses  in  the  AR  spectra  and  to  reduce  the 
effect  of  spectral  line  splitting.  Using  fewer  AR  parameters 
than  data  values  reduces  the  relative  improvement  of  the  AR 
spectrum  over  conventional  spectra.  As  an  example  taken  from 
Figure  3,  consider  the  case  when  the  SNR  is  -5  dB  and  64  data 
samples  are  taken.  The  normalized  resolution  for  the  periodo- 
gram  (FFT)  for  64  points  is  about  5.4.  The  normalized  resolu- 
tion of  the  AR  PSD  for  32  coefficients  and  -5  dB  is  approxi- 
mately 2.8.  The  relative  resolution  for  this  case  is 


/2ti  64-A^  5.4 

~ R*e/2rr32At  2(2.8} 


Since  this  ratio  is  less  than  one,  the  frequency  resolution  of 
the  AR  PSD  estimate  will,  on  the  average,  be  worse  than  the  re- 
solution of  a conventional  FFT  analysis.  Thus,  for  low  SNRs, 
there  is  normally  no  advantage  to  using  an  AR  spectral  esti- 
mate . 

SUMMARY 

The  conventional  Fourier  methods  have  a mean  resolution  that 
is  a function  of  the  type  and  duration  AT  of  the  window  func- 
tion. With  the  concept  of  resolution  introduced  in  this  paper,  the 
frequency  resolution  Af  * 0.86/At.  The  AR  method  has  a resolu- 
tion that  degrades  to  that  of  the  conventional  methods  as  the 
SNR  decreases.  Perfect  resolution  is  obtainable  with  the  PD 
method  if  the  autocorrelation  function  is  perfectly  known  or 
accurately  estimated. 
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PSD  RESPONSE  -*•  PSD  RESPONSE 


FREQUENCY 


(A)  WELL-RESOLVED. 
Peaks  at  sinusoid 
frequencies . 


FREQUENCY 


(C)  JUST-RESOLVED . 

Peaks  shift  toward  center. 


FREQUENCY  -► 


(B)  NEAR  CONDITION  OF 
RESOLUTION.  Peaks  shift 
away  from  true  frequencies. 


(D)  NOT  RESOLVED. 
Frequencies  too  close  to 
to  be  resolved. 


FIGURE  2.  Illustration  of  Condition  of  Resolution. 
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fc.SO 


M,  Number  of  Autocorrelation  Lags* 

FIGURE  3.  Normalized  Resolution  for  Conventional, 
Autoregressive,  and  Pisarenko  Methods 

♦Number  of  AR  Coef f icients=M-l 
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NORMALIZED  RESOLUTION,  R^  - 2»M4tAf 


FIGURE  4.  Periodogram  (FFT)  Resolution  Variance  As  A Function 
of  Initial  Phase. 
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Normalized  Resolution  Using  Data  Samples  Rather 
Than  Known  Autocorrelation  Lags. 


AUTOREGRESSIVE  MODEL  SPECTRAL  ESTIMATION, 

SOME  SIMULATION  STUDY  STATISTICAL  PERFORMANCE  RESULTS. 
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ABSTRACT 


Some  results  of  a simulation  study  of  the  statistical  performance  of 
autoregressive  (AR)  modeled  spectral  density  and  spectral  coherence  estima- 
tion are  shown.  The  AR  model  order  was  determined  by  Akaike's  AIC  informa- 
tion theoretic  criterion.  Sharply  peaked  spectrum  and  smooth  spectrum  sit- 
uations were  considered.  In  the  smooth  spectrum  situation,  the  statistical 
performance  of  the  AR  modeled  spectral  density  and  spectral  coherence  esti- 
mates in  the  vicinity  of  zero  coherence,  may  be  conservatively  approximated 
by  v = N/pmin  degrees  of  freedom  where  N is  the  number  of  data  points  and 
pmin  is  the  order  of  the  AR  model  fitted  to -the  data.  Spectral  estimation 
in  the  vicinity  of  sharp  spectral  peaks  is  poorest  in  the  neighborhood  of 
the  peaks  for  AR  modeled  spectral  estimation  as  it  is  for  conventional  win- 
dowed periodogram  estimation.  For  sharp  spectra,  with  large  values  of  N, 
estimation  of  the  spectral  troughs  and  zero  coherence  appears  to  be  well 
approximated  by  the  v = N/pmin  property.  Evidence  for  the  asymptotic  unbi- 
asedness.and  consistency  of  AK  modeled  spectral  estimation  is  also  shown. 

INTRODUCTION 

There  is  an  increasing  interest  in  autoregressive  (AR)  parametric  model 
methods  for  the  spectral  analysis  of  stationary  time  series  data.  The  meth- 
ods by  which  AR  models  are  computed  and  the  methods  by  which  the  AR  model 
order  is  "best”  fitted  to  the  observed  data  are  being  actively  researched. 

The  statistical  properties  of  spectral  estimates  obtained  by  AR  models  fitted 
to  stationary  time  series  data  has  had  less  attention.  Only  limited  theo- 
retical results  of  the  statistical  properties  of  AR  modeled  spectral  esti- 
mates are  known  or  conjectured,  [l]-[6].  The  status  of  the  subject  and  our 
interest  in  particular  applications  in  which  AR  modeled  spectral  analysis 
has  an  important  role,  [7J-[10],  motivated  a simulation  study  of  some  of  the 
statistical  properties  of  AR  modeled  spectral  estimates  in  stationary  vector 
time  series. 

We  considered  AR  modeled  spectral  estimates  by  the  Whittle-Akaike  re- 
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cursive  computation-AIC  criterion  model  order  method,  [ll]-[13],  [9].  In 
that  method,  increasing  order  AR  models  are  recursively  fitted  to  the  sample 
covariance  data  using  Whittle's  algorithm.  The  "best"  AR  model  fitted  to 
that  data  is  that  AR  model  whose  order  is  determined  by  Akaike's  AIC  crite- 
rion. The  AIC  criterion  selected  AR  model  has  an  asymptotic  minimum  predic- 
tion variance  property. 

The  phenomenology  of  AR  spectral  estimation  is  different  than  that  of 
conventional  windowed  periodogram  spectral  estimation.  Some  of  that  pheno- 
menology as  well  as  some  results  on  the  asymptotic  unbiasedness  and  consis- 
tency of  the  AR  modeled  spectral  density  and  spectral  coherence  estimation 
are  reported  in  this  paper. 

THE  AR  MODEL  AND  THE  METHOD 

N consecutive  samples  of  d simultaneous  time  series  (x(t);  t=l,  ...N}, 
from  an  assumed  covariance  stationary  time  series  are  observed.  The  sample 
mean  is  deleted  from  each  of  the  time  series  and  the  dxd  sample  matrix  co- 
variance  function 


N-k 

CXX(k)  = x(t+k)x(t) ' ; k=0,  1,  ...L,  (1) 

t=l 


is  computed  from  the  remaining  time  series.  In  (1),  and  subsequently,  ' de- 
notes the  matrix  transpose  operation.  L,  the  maximum  number  of  lags  consi- 
dered for  analysis  may  be  approximated  by  the  empirically  determined  bound 
L s 3//R7d. 

An  AR  model  of  {x( t ) > of  order  p is 
P 

x(t)  = - ^ A(p)(i)x(t-i)  + e(t),  (2) 

i=l 

where  the  A(i)  are  dxd  matrices  and  (t)  is  a zero-mean  d-vector  innovations 
with  covariance  matrix  V.  The  AR  model  coefficients  satisfy  the  Yule-Walker 
equations, 

P 

^ A(p)(i)CXX(j-i)  = 0;  j = 1,  ...p,  A(0)  = I (3) 

C=6  P 

A(p)(i)CXX'(i). 

c=0 
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The  order  of  the  "best"  AR  model  fitted  by  Akaike's  AIC  criterion,  pmin, 
satisfies 


pmin  = min  (Nlog  |Vp|  + 2d(dp+l)),  p = 0,  1,  ...»  L 
P 


(4) 


The  estimate  of  the  dxd  spectral  density  matrix  S(f)  at  frequency  f is 
computed  using  the  known  formula,  see  [6]  for  example. 


S(fl  = T I + A e"^77^  + A e“j2TTpminf-i-lv 
Mt;  Li  + Axe  ***  Apmine  J Vpmin 

[I  * + ...  A' 

1 pmin  J 


(5) 


The  square  of  the  spectral  coherence  of  frequency  f between  the  time  series 
x.(t),  x.(t)  is  defined  by, 

J 


? S..(f)  • S..(f) 

wij(f)  = S^(f)S.j(f)  ; i*  j = ••••  d*  1*  * j 


(6) 


where  the  average  cross-power  spectral  density,  S..(f),  is  the  ij  element  of 

S(f).  1J 


Figure  1 shows  AR  modeled  and  Parzen  windowed  spectral  analysis  results 
for  a d=2  vector  time  series.  The  data  analyzed  corresponds  to  simulation  of 
an  AR-MA  model  of  the  random  vibrations  of  a two  degree-of-freedom  structural 
system,  [8],  with  N=1000.  The  AR  model  equivalent  to  a finite  order  AR-MA 
model  is  known  to  be  of  infinite  order.  The  spectrum  and  coherences  esti- 
mates achieved  by  the  AR  model  fitted  to  the  simulated  data,  (Fig(l)),  are 
visually  indistinguishable  from  the  theoretical  AR-MA  model  spectrum  and 
coherence.  The  lag-50  Parzen  windowed  spectral  analysis  has  reasonable 
coherence  estimation  performance  but  inadequate  resolution.  Also  the  Parzen 
window  estimate  of  the  major  spectral  peak  is  significantly  under-biased. 
Increasing  the  spectral  resolution  by  increasing  the  lag  number  would  in- 
crease the  variance  of  the  estimates.  That  would  introduce  "bumpiness"  in 
the  spectrum  removed  from  the  peak  and  in  the  coherence.  Thus,  Figure  1 
illustrates  the  known  results  that  the  spectral  estimation  of  time  series 
with  sharp  spectral  peaks  is  difficult  with  conventional  windowed  periodo- 
gram  methods  and  that  parametric  AR  modeled  spectral  analysis  can  have  rela- 
tively sharp  resolution.  These  results  motivated  a statistical  performance 
study  in  which  AIC  order  determined  AR  modeled  spectral  analysis  of  a "worst 
case"  sharp  spectral  peak  time  series  example.  A more  familiar,  "smooth" 
spectrum  situation  was  also  considered. 
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RESULTS 


A:  Sharp  Spectral  Peak  Case 

An  AR-MA  model  with  d-2,  corresponding  to  the  random  vibrations  of  a two 
degree  of  freedom  structure  was  used  as  the  simulation  model,  [8].  Ten 
trials  with  N=500,  and  twelve  simulation  trials  with  N=1000,  and  N=2000  data 
points  were  simulated.  Table  I shows  the  average  and  standard  deviation  of 
the  AIC  criterion  determined  AR  model  order,  pmin,  for  the  simulation  trials. 


TABLE  I:  pmin  Average  and  pmin 
Standard  Deviation  for  Different  Simulation  Data  Lengths 


N=500 

N=1000 

N=2000 

pmin  average 

5.82 

6.08 

6.67 

pmin  std.  deviation 

1.83 

1.73 

.65 

As  indicated  in  Table  I and  as  expected,  on  the  average,  the  length  of  the  AR 
model,  fitted  to  a sample  function  of  a stationary  time  series  of  presumed 
infinite  order,  increases  and  the  variability  of  the  AR  model  order  decreases 
with  increasing  data  length  N.  For  fixed  N,  the  determinant  of  the  matrix  of 
the  the  residuals,  Vp,  is  a non-increasing  function  of  p.  In  the  vicinity  of 

p=0,  Vp  decreases  sharply  for  increasing  values  of  p and  then  tends  toward  a 

constant  value,  the  innovations  matrix  variance,  with  increasing  values  of  p. 
This  effect  plus  the  linear  increase  in  the  term  2d(dp+l)  in  Eq(4)  with  p, 
account  for  the  average  increase  of  pmin  with  increasing  N.  \ 

Figure  2a,  b show  the  theoretical  values  and  means,  and  Figure  2c,  d the 
mean  and  mean  + std.  deviation  of  one  of  the  two  simulated  time  series  AR 
modeled  average  power  spectral  densities  for  N=500  and  N=2000.  These  results 
are  compatible  with  the  theoretical  results  which  suggest  that  the  modeled 
average  power  spectral  density  tends  to  be  unbiased  and  consistent. 

In  conventional  windowed  periodogram  spectral  analysis,  the  spectral 
estimate  at  frequency  f is  approximately  distributed  as  a random  variable 
which  is  a constant  times  a random  variable  which  is  distributed  chi-square 

with  degrees  of  freedom,  aX^,  where  a=S(f)/v,  vs2S2(f)/var§(f) , [14],  The 

only  "known"  related  result  on  the  behavior  of  AR  modeled  spectral  estimates 
is  conjecture  by  Parzen,  [3],  that  AR  modeled  spectral  estimates  are  asymp- 
totically distributed  In  accordance  with  a complex  Wishart  distribution  with 
v = N/pmin  df. 


Accordingly,  to  study  the  empirical  statistical  properties  of  AR  spec- 
tral estimates,  define  ^(f),  Vg(f),  vv(f)  respectively,  the  empirical  total 

equivalent  number  of  degrees  of  freedom,  (df),  the  df  due  to  bias  and  the  df 
due  to  variance  respectively  at  frequency  f.  These  are  given  by 


vT(f)  = 


2 S(f) 


Var  §(f)  + (S(f ) - S(f))2 


, vB(f)  = 


2 S2(f) 


(S(f ) - S(f))2 


vv(f)  = 2 S^f-)  . / (7) 

Var  S(f)  

In  Eq.  (7),  S(f),  S(f),  S(f),  and  Var  S(f)  are  respectively  the  theoretical 
value,  the  estimated  value,  the  mean  and  the  variance  of  the  spectral  density 
at  frequency  f.  Table  2 is  a list  of  the  quantities  Vg,  and  vT  at  the 

zero,  first  peak,  trough,  second  peak  and  end  point  of  the  spectral  density 
for  N=500  and  N=2000  for  the  time  series  modeled  in  Figure  2. 


TABLE  2:  Equivalent  Degrees  of 
Freedom  of  the  Spectral  Estimates 

N=500 

N=2000 

VB  vv  VT  VB 

vV 

VT 

zero 

749 

73 

67 

8432 

304 

293 

first  peak 

16 

2 

2 

7945 

17 

17 

trough 

163 

60 

44 

1568 

585 

425 

second  peak 

on 

cJ 

55 

19 

2633 

54 

47 

end  point 

346 

43 

41 

14162 

150 

148 

The  following  observations  were  abstracted  from  Table  2 and  Figure  2: 

(i)  For  fixed  N,  df  varies  considerably  as  a function  of  frequency, 

(i i )  For  fixed  N,  df  decreases  as  the  sharpness  of  the  spectral  peaks 
increases. 

(iii)  For  fixed  N,  vT  is  larger  at  the  troughs  than  at  the  peaks. 


41 


(iv)  For  increasing  N,  Vy  may  be  approximated  at  the  troughs  by  N/pmin 
(500/5.82  = 86,  2000/6.67  = 300) 

(v)  For  increasing  N,  Vy  becomes  dominated  by  the  variance  of  the 
spectral  estimates  (vv) 

Result  (i)  is  a "known"  property  of  AR  modeled  spectral  analysis.  It  holds 
for  smooth  as  well  as  sharply  peaked  spectra.  For  smooth  spectra,  conven- 
tional spectral  analysis  yields  relatively  constant  values  of  v.  Results 
(ii)  and  (iii)  are  similar  to  those  known  for  conventional  spectral  analysis. 
Result  (iv)  suggests  that  for  sufficiently  large  N,  the  equivalent  number  of 
degrees  of  freedom,  Vy,  at  the  spectral  troughs  may  be  conservatively  esti- 
mated by  v =s  N/pmin.  Result  (v)  is  consistent  with  Parzen's  claim  that  AR 
spectral  estimates  tend  to  be  (relatively)  unbiased. 

The  sample  properties  of  windowed  periodogram  spectral  coherence  esti- 
mates were  treated  by  Akaike  and  Yamanouchi , [15],  Jenkins  and  Watts  [14], 
Brillinger  [16]  and  Koopmans  [17].  These  results  may  be  summarized  by,  [16], 

E[W2(f)]  = W2(f)  + 0(BW)  + O(^) 

Var  W2(f)  = -riy  W2(f)  (1  - W2(f))2  + 0(— \-j)  (8) 

vl  BW^N 

where  BW  denotes  "bandwidth"  and  W (f ) denotes  the  estimate  of  the  spectral 
coherence  at  frequency  f.  From  Eq.  (8),  it  is  seen  that  near  zero  coherence, 
the  coherence  estimates  are  dominated  by  bias  errors  and  that  as  N increases, 
the  variance  of  the  conventional  coherence  estimate  tends  to  zero  and  the 
bias  tends  to  a constant  0(BW).  Figure  3 shows  the  results  for  the  theoret- 
ical, mean  and  mean  ± standard  deviation  of  the  spectral  coherence  for  N=500 
and  N=2000  for  the  sharp  spectrum  example.  The  coherence  estimates  achieved 
by  the  AR  model  spectral  estimates  tend  to  be  unbiased  and  consistent.  The 

' 9 

poorest  statistical  behavior  is  in  the  vicinity  of  W (f)  equal  to  zero. 
Following  Amos  and  Koopmans,  [18],  an  exact  expression  for  the  distribution 

of  z =/ w (f ) when  W2(f)  = 0 is 

F(z)  = 2(v  - 1)  z (1  - z2)v"2. 

Using  the  integral  expression  for  the  beta  function,  the  mean  and  variance  of 
W2(f)  when  W2(f)=0  becomes 

E[W2(f) |W2(f)=0]  = i 

Var[W2(f ) |W2(f)=0]  * (!  - £) 
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Using  the  results  of  Eq.  (10)  as  a guide,  an  approximate  equivalent  number  of 
degrees  of  freedom  was  computed  for  the  estimate  of  zero  coherence  for  the 
AR  modeled  spectral  estimates  based  on  the  bias  and  variance  of  the  coherence 

estimates  in  the  vicinity  of  f=.2.  (vB(f)  = (W2(f))_1;vv(f)  = (VarW2(f))~2). 

These  results,  with  the  Parzen  conjectured  df,  are  in  Table  3. 


TABLE  3:  Degrees  of  Freedom  for 
Zero  Coherence  Estimates 


The  data  in  Table  3 indicates  that  the  df  estimated  by  the  bias  and  the  vari- 
ance of  the. zero  coherence  estimates  are  very  similar.  This  evidence  sup- 
ports the  notion  of  an  equivalent  number  of  degrees  of  freedom  for  the  esti- 
mation of  zero  coherence. 

B:  Smooth  Spectral  Density  Case 

For  this  situation  an  AR  model  of  pmin=7  fitted  to  an  electroencephalo- 
gram (EEG)  time  series  with  d=3  was  selected  as  the  simulation  model.  Ten 
realizations  each  for  data  lengths  N=200  and  N=800  were  simulated.  Figures 
4a,  4b  show  the  mean  spectral  density  and  the  mean  ± standard  deviation  of 
the  spectral  density  computed  from  the  synthesized  data  respectively  for 
N=200  and  N=800  for  one  time  series.  Figures  4c,  4d  show  the  mean  coherence 
and  mean  ± standard  deviation  of  the  coherence  computed  for  two  data  channels 
for  N=200  and  N=800  respectively.  The  true  model  values  of  the  spectral  den- 
sity and  spectral  coherence  are  extremely  close  to  the  mean  values  for  N=800 
in  Figures  4b,  4d. 

The  illustrations  of  the  spectral  density  and  spectral  coherence  for  the 
smooth  spectral  density  estimates  also  suggest  that  AR  spectral  estimates  are 
asymptotically  unbiased  and  consistent.  In  contrast  with  the  sharply  peaked 
spectrum  situation,  in  the  smooth  spectrum  situation,  the  largest  spectral 
deviations  occur  at  the  troughs  of  the  spectrum.  Using  the  notion  of  the  df 
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due  to  variance,  v^,  defined  in  Eq.  (7),  the  df  for  N=200  and  800  respec- 
tively were  30  and  150.  Correspondingly,  the  Parzen  df  estimates  are  29  and 
114.  Using  the  definition  of  df  for  coherence  estimates,  Eq.  (8),  we 
obtained  V2qq  ^ 30,  Vggg  > 120  in  the  vicinity  of  the  estimate  of  zero  coher- 
ence. Thus  v = N/pmin  appears  to  be  a reasonably  conservative  estimate  to 
use  in  evaluating  smooth  spectrum  AR  spectral  estimate  performance. 

SUMMARY 

Classical  windowed  periodogram  spectral  analysis  is  an  empirical  proce- 
dure. In  that  procedure,  "window  carpentry"  is  subjectively  performed  to 
balance  bias  and  variance  errors.  The  empirical -subjective  nature  of  spec- 
tral analysis  is  generally  ignored  and  frequently  users  of  spectral  analysis 
subroutines  do  not  have  sufficient  expertise  to  properly  interpret  the 
results  of  their  computations.  In  contrast  with  the  classical  method  of 
spectral  analysis,  by  the  use  of  Whittle's  recursive  computational  procedure 
and  Akaike's  AIC  criterion,  autoregressive  model  spectral  estimates  can  be 
computed  by  an  automatic  computational  procedure  that  has  nice  statistical 
properties . 

Recent  theoretical  results  on  AR  modeled  spectral  analysis  estimates 
suggest  that  they  are  asymptotically  unbiased  and  consistent  and  that  their 
variance  is  as  low  as  that  achieved  by  the  best  windowed  periodogram  spectral 
estimators.  The  empirical  evidence  obtained  from  simulation  trials  shown 
here  suggest  that  Akaike  AIC  criterion-AR  model  spectral  estimates  are  asymp- 
totically unbiased  and  consistent.  The  evidence  also  supports  Parzen 's  con- 
jecture that  asymptotically,  for  smooth  spectrum  situations,  the  number  of 
degrees  of  freedom  in  a spectrum  estimate  or  in  the  estimate  of  zero  coher- 
ence can  be  conservatively  estimated  as  v=N/pmin.  Spectral  estimates  in  the 
vicinity  of  sharp  spectral  peaks  are  likely  to  have  large  bias  and  variance 
errors.  Asymptotically  the  estimates  of  the  troughs  of  sharply  peaked  spec- 
tral time  series  do  appear  to  follow  the  v=N/pmin  law.  AR  modeled  spectral 
estimates  do  clearly  have  the  resolution  property  to  reproduce  sharp  spectral 
peaks  without  introducing  the  bumpiness  or  large  variance  that  is  character- 
istic of  conventional  windowed  periodogram  spectral  analysis. 

REFERENCES 


1.  Akaike,  H.,  1969,  "Power  Spectrum  Estimation  Through  Autoregressive 
Model  Fitting",  Ann.  Inst.  Stat.  Math. 

2.  Kromer,  R.  E.,  1969,  "Asymptotic  Properties  of  the  Autoregressive  Spec- 
tral Estimator",  Tech.  Rept.  No.  13,  Dept,  of  Statistics,  Ph.D.  Disser- 
tation—Stanford  University. 


3.  Parzen,  E.,  1970,  "Multivariate  Time  Series  Modeling,  in  Multivariate 
Analysis  II11,  P.  R.  Krishnaiah,  Ed.,  Academic  Press,  N.Y, 

4.  Berk,  K.  N.,  1974,  "Consistent  Autoreqressive  Spectral  Estimate",  Annals 

of  Stat.  2,  P.  489-5CZ:  

5.  Huzii,  M. , February  1976,  "On  a Spectral  Estimate  Obtained  by  Fitting 
an  Autoreqressive  Model  Fitting",  Technical  Report  No.  22,  Stanford 
Dept,  of  Statistics,  T.  W.  Anderson,  Project  Director. 

6.  Hannan,  E.  J.,  1970,  "Multiple  Time  Series",  Wiley,  N.Y. 

7.  Gersch,  W.  and  Tharp,  B.  R.,  1976,  "Spectral  Regression  - Amount  of 
Information  Analysis  of  Seizures  in  Humans,  in  Quantitative  Analytic 
Studies  in  Epilepsy",  Kellaway.  P.  and  Petersen,  I.  S..  Edits..  Raven 
Press,  New  York,  P.  508-532. 

8.  Gersch,  W.  and  Yonemoto,  J.,  1977,  "Synthesis  of  Multivariate  Random 
Vibration  Systems:  A Two  Stage  Least  Squares  AR-MA  Model  Approach", 
Jour,  of  Sound  and  Vibration,  52(4)  P.  553-565. 

9.  Gersch,  W.  and  Yonemoto,  J.,  1977,  "Parametric  Time  Series  Models  for 
Multivariate  EEG  Analysis",  Computers  and  Biomedical  Research,  10, 

P.  113-125. 

10.  Gersch,  W.  and  Yonemoto,  J,,  1977,  "Automatic  Classification  of  Multi- 
variate EEGs  Using  an  Amount  of  Information  Measure  and  the  Eigenvalues 
of  Parametric  Time  Series  Model  Features",  Computers  and  Biomedical 
Research,  10,  P.  297-318. 

11.  Whittle,  P.,  1963,  "On  the  Fitting  of  Multivariate  Autoregressions  and 
the  Approximate  Canonical  Factorization  of  a Spectral  Density  Matrix", 
Biometrika  50,  P.  129-134. 

12.  Akaike,  H.,  1974,  "A  New  Look  at  Statistical  Model  Identification",  IEEE 
Trans,  on  Automatic  Control  AC-19,  P.  716-723. 

13.  Akaike,  H.,  1976,  "Canonical  Correlation  Analysis  of  Time  Series  and  the 
Use  of  an  Information  Criterion",  System  Identification:  Advances  and 
Case  Studies,  Mehra,  R.  K.  and  Lainotis,  P.  G.,  Edits.,  Academic  Press, 
P.  27-96. 

14.  Jenkins,  G.  M.  and  Watts,  D.  G.,  1968,  "Spectral  Analysis  and  Its  Appli- 
cations", Holden-Day,  San  Francisco. 

15.  Akaike  H.  and  Yamanouchi,  Y.,  1962,  "On  the  Statistical  Estimation  of 
Frequency  Response  Function",  Ann.  Inst.  Statist.  Math.,  P.  23-56. 


45 


16.  Brillinger,  D.  R. , 1975,  "Time  Series,  Data  Analysis  and  Theory", 
Holden-Day,  San  Francisco. 

17.  Koopmans,  L.  H.,  1974,  "The  Spectral  Analysis  of  Time  Series",  Academic 
Press,  N.Y. 

18.  Amos,  D.  E.  and  Koopmans,  L.  H. , 1963,  "Tables  of  the  Distribution  of 
the  Coefficient  of  Coherence  for  Stationary  Bivariate  Gaussian  Pro- 
cesses" , Sandia  Corp.  Monograph  SCR-483,  Albuquerque,  New  Mexico. 

FIGURES 


FIGURE  1:  AR  modeled  pmin  7 (solid  lines)  and  Parzen  lag  50  windowed 
periodogram  spectral  analysis  (dashed  lines)  of  simulated  structural 
system  random  vibrations. 
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SPECTRAL  DENSITY  IN  DB. 


-50.0  - 


NORMALIZED  FREQUENCY 

FIGURE  2:  Theoretical  and  mean  values  (a)  N=500,  (b)  N=2000; 
mean  and  mean  ± standard  deviation,  (c)  N=500,  (d)  N=2000  of 
the  average  power  spectral  density  versus  frequency. 
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Abstract 

A rational  spectral  model  is  derived  for  high  resolution  spectral 
estimation  of  data  containing  autoregressive  signal, interference  and  white 
noise. 

A computationally  efficient  method  for  estimating  the  rational  spectrum 
is  introduced.  Examples  of  the  spectra  calculated  by  this  method  are  presented 
and  compared  with  autoregressive  spectral  estimator. 

Introduction 

Recent  interest  in  power  spectral  estimation  has  been  focussed  on  data 
adaptive  methods.  This  is  because  such  methods  are  free  from  the  effects  of 
fixed  window  funct ions  associated  with  the  traditional  Blackman  and  Tukey 
type  methods  for  the  estimation  of  spectra.  One  class  of  data  adaptive 
spectral  estimators  is  based  on  rational  spectral  models.  Such  a method  was 
proposed  by  Tretter  and  Steiglitz  [1]  among  others  e.g.  [2].  The  general 
rational  model  for  spectra,  however,  has  not  gained  popularity  in  practice 
due  to  computational  complexities  and  lack  of  understanding  of  the  statistical 
properties  of  such  estimators. 

A subclass  of  rational  spectral  models,  however,  has  become  very  popular 
in  high  resolution  spectral  estimation  applications.  This  subclass  is  the  all- 
pole model  and  includes  the  familiar  autoregressive  (AR)  and  maximum  entropy 
(MEM)  spectral  estimators. 

The  properties  of  the  AR  spectral  estimator  have  been  studied,  theoreti- 
cally in  the  asymptotic  case  [3],  [4],  [5]  and  empirically  [6]  - [9].  It  has 
been  shown  that  this  estimator  in  many  cases  offers  considerably  higher 
resolution  based  on  the  same  amount  of  data,  than  the  Blackman  and  Tukey  type 
estimators.  Furthermore,  the  above  asymptotic  and  empirical  investigations 
have  shown  the  variance  of  the  AR  estimates  to  be  comparable  to  the  unsmoothed 
Blackman  and  Tukey  type  estimates,  for  the  same  number  of  autocorrelation 
lags.  It  should  be  pointed  out,  however,  that  the  AR  estimates  usually 
require  much  fewer  lags  for  the  same  resolution  thus  avoiding  the  problem  of 
instability  in  the  estimates. 


One  problem  that  has  usually  not  been  discussed  in  conduction  with  the 
AR  estimator  has  been  its  behavior  in  resolving  spectral  peaks  from  noisy 
data.  Whereas  theoretical  treatment  of  this  problem  is  difficult,  it  is  one 
of  great  practical  importance  in  spectral  estimation  where  the  desired 
signals  are  usually  buried  in  noise.  Two  approximate  and  empirical  studies 
on  the  resolution  of  AR  sepctral  estimators  have  pointed  out  the  dependence 
of  its  resolution  on  the  signal-to-noise  ratio.  Lacoss  [6]  and  Kaveh  and 
Cooper  [8]  have  shown  that  the  order  of  the  AR  spectral  estimator  for  the 
same  resolution  of  a sinusoid  and  that  of  a narrow-band  process  with  additive 
wide-band  independent  noise,  to  be  approximately  equal  to  the  square  root  of 
the  signal-to-noise  ratio.  Thus,  for  noisy  signals  one  may  have  to  resort  to 
a high  order  estimator  for  its  resolving  capabilities  and,  therefore,  become 
vulnerable  to  unstable  estimates.  To  remedy  this  situation,  this  paper 
discusses  spectral  estimation  with  models  containing  both  poles  and  zeros. 
Specifically,  rational  models  are  introduced  as  a means  of  resolving  the 
spectra  of  narrowband  signals,  with  Lc  order  AR  models,  in  the  presence  of 
noise  and  interference.  Preliminary  examples  using  an  ad-hoc  method  for  the 
estimation  of  spectra  are  presented. 


II.  The  Autoregressive  (AR)  Spectral  Estimator 

A zero-mean  time  series  {x^}  is  said  to  satisfy  an  Lt*1  order  autoregres- 
sive model  if : 


“i  Xt-i  + Ut 


(1) 


where  {a^}  denote  the  AR  coefficients  and  (u  ) is  a zero-mean  uncorrelated 
(white)  sequen^  • Another  interpretation  ofthe  model  in  (1)  is  that  {o^} 
represent  an  L order  one-step  ahead  linear  predictor  of  {x,.}.  If  {ot.}  are 
then  estimated,  based  on  a minimum  mean  square  error  criterion  {u  } on  the  average 
becomes  an  orthogonal  sequence.  It  can  be  shown  [10]  that  ttie  model  in  (1) 
leads  to  a spectral  density  of  the  form 


sL(f) 
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I ot  e | 

k=l  X 


(2) 


where  S^  is  the  spectral  level  of  the  {ut}  sequence. 

The  spectral  density  shown  in  (2)  is  the  AR  spectrum  and  it  is  this 
model  that  is  fitted  to  an  observed  time  series,  by  simply  estimating  {ot  } 
from  the  time  series.  Several  estimation  procedures  for  {a^}  have  been 
discussed  in  the  literature  such  as  the  maximum  likelihood,  the  least-squares 
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[10]  and  a method  based  on  forward  and  backward  prediction  error  filtering 

[11] .  If  the  number  of  data  samples  is  not  very  small,  the  simplest  and 
computationally  most  efficient  estimates  of  {a  } are  from  a solution  of  the 
Yule-Walker  equations.  These  equations  arise,  simply  bv  multiplying  equation 
(1)  by  x 1=1,... L and  taking  the  expectation  of  both  sides,  to  obtain  a 
relationtbetween  the  autocorrelation  function  of  the  process  {xt}  and  the 
coefficients  {a^}.  The  Yule-Walker  equations  are  then  given  by: 

R A = p (3) 

o o 

where 


r0  rl  rL-l 

ai 

ri 

rl 

, A = 

and  p = 
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• 
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o 

• 

rL-l  ro 

rL 

and  where  r^  is  the  autocorrelation  function  of  {x^}  at  the  i lag.  Further- 
more, the  power  in  {ut}  can  be  found  by  multiplying  (1)  by  xt  and  taking  the 
expectation  as: 

L 

S,  = [r  - E c^rjAT  , AT  the  sampling  interval  (4) 

1 o . , i l 
i=l 

In  practice  {a_^}  and  are  estimated  from  (3)  and  (4)  based  on  estimates  of 
the  autocorrelation  function  {r^}.  Figure  [1]  shows  a comparison  of  the 
spectral  estimates  of  a radar  doppler  spectrum  by  the  AR  method  and  the 
traditional  Blackman  and  Tukey  method  with  a Hanning  taper.  The  resolving 
capability  of  the  AR  method  based  on  much  fewer  lags  L is  quite  evident  in 
this  example. 


III.  Spectral  Estimation  in  the  Presence  of  Noise  ant}  Interference 

We  now  assume  that  the  signal  {xt}  satisfies  an  L*"*1  ordhr  AR  model  and 
therefore  its  spectrum  can  be  estimated  as  in  the  previous  section.  The 
problem  of  interest  is  the  estimation  of  the  spectrum  of  the  observed  signal 

yt  = xt  + wt  + nt  (5) 

where  { co  } is  an  AR(M)  process  and  considered  to  be  the  interference  and 
is  a white  noise  sequence,  with  {nt},  {xt}  and  {oo^}  mutually  uncorreWted. 
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Tlie  resolution  of  the  spectral  estimators  are  now  discussed  in  the  asymptotic 
case,  that  is,  when  the  autocorrelation  function  of  {y  } is  accurately  known. 


Let  {x  } be  described 
autoregressive  model 


by 


(1)  and  {<o  } be  given  by  the  following 


u 

t 
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E b 
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i Vi  + Vt 


The  z-spectrum  of  is  then  given  by: 


Sy(z)  = 


Dx(z)Dx(z  ) 


DU(*)D»(rl)  +S° 


(6) 


where  S.  and  S2  are  given  by  (4)  using  the  appropriate  autocorrelation  values 

for  (x_7  and  {u>  },  S is  the  spectrum  of  n and 
t t n t 
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Putting  (6)  under  a common  denominator,  it  becomes  obvious  that  Sy(z)  is  the 

spectrum  of  an  autoregressive  moving  average  process  of  orders  L+M  and  L+M, 

i.e.,  AR(L+M) /MA(L+M) . This  is  a process  with  L+M  zeros  and  L+M  poles,  where, 

from  equation  (6),  the  numerator  polynomial  coefficients  are  related  to 

{a.},  {b.}  and  S in  an  obvious  manner, 
i 1 n 

It  can  be  seen  that  the  estimation  of  S^(z)  using  a purely  AR  technique 
(all-pole)  in  equivalent  to  approximating  the  (L+M)  order  moving  average 
component  by  an  AR  one.  This,  theoretically  would  require  an  infinite  order 
model.  Finite  order  models  of  order  L,  however,  will  estimate  S (f)  with 
good  resolution,  with  L depending  on  the  various  parameters  of  signal  noise 
and  interference,  notably  their  relative  power. 

It  is  obvious  now  that  whereas  the  resolution  of  Blackman  and  Tukey  type 
spectral  estimations  are  only  determined  by  the  window  bandwidths  in  a 
predictable  fashion,  those  of  the  AR  and  generally  ARMA  estimators  are  very 
much  data  dependent,  requiring  larger  all-pole  orders  or  ARMA  modeling. 

The  general  problem  of  the  identification  of  ARMA  models  for  time  series 
has  been  treated  by  many,  see  for  example  [11]  , [12] , [13] . The  methods 
require  the  solution  of  non-linear  equations  and  involve  constrained 
minimization  techniques.  Since  in  the  present  problem  quadratic  functions  of 
the  moving  average  parameters  are  needed,  an  ad-hoc  technique  for  the  cal- 
culation of  Sy(f)  given  the  autocorrelation  function  of  {yt}  is  introduced. 


This  method  is  then  used  to  calculate  some  spectra  of  noisy  signals  based  on 
theoretical  values  of  the  autocorrelation  functions.  This  is  done  to  demon- 
strate the  resolution  aspects  of  the  A.R  and  associated  ARMA  models  for  the 
observed  signal. 


Let  {y  } have  an  autocorrelation  function  {rtK  And  denote  Sy(z)  result- 
ing from  equation  (6)  by  the  rational  form: 


Sy(z) 


B(z)B(z  1) 
D(z)D(z  1) 


(7) 


L+M  L+M  . 

where  B(z)  = E 6 z1  , D(z)  = E d£z  with  dQ  - 1,  ^ ^ 1 1 1 and  ai 

i=o  1 i=o 

the  AR  coefficients  in  the  ARMA  representation  {y  }.  It  is  noted  that,  for 
spectral  estimation,  only  quadratic  functions  of  are  required.  This  is 
obvious  since 


Sy(z) 


K 

E c 
k=-K  k 

D(z)D(z 


k 

z 


(8) 


where  K = L+M  and  c,  are  related  to  B.  in  an  obvious  fashion.  The  scheme  for 
computing  S (z)  is  now  to  first  identify  { d . } . This  can  be  done  by  a number 
of  techniques,  see  for  example  [10]  and  [1 A J . In  this  paper  {d^}  are  cal- 
culated as  the  solution  of  modified  Yule-Walker  equations  given  by 


V °K 


(9) 


where  A is  given  in  formula  (3)  and 


*K 


rK  rK-l 


rK+l  rK  ••••  r2 


r2K-l  r2K-2  rK 


and  p 


K 


K+l 


' 2K 


Following  the  calculation  of  {d^,  {c^}  can  be  evaluated  very  simply  by 
observing  that  by  definition 

00 

SY(Z)  = 1 r 111  Z±  (10) 

i = — 00  ' • 
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Equating  the  right-hand  sides  of  (9)  and  (10)  {c^}  can  be  found  as 


Ck  = c-k  = i=0  ri  ii_j i_ki 


(id 


Figures  [2]  - [4]  show  the  spectra  of  narrowband  processes  calculated  from 
their  theoretical  autocorrelation  functions  using  the  ARfall-pole)  and  ARMA 
formulations  as  in  formulae  (2)  and  (9)  with  z = e-1  **  . The  theoretical 

increase  in  the  order  of  the  AR  sepctral  estimator  for  the  same  resolution 
of  the  signal  in  the  presence  of  noise  and  interference  is  evident.  It  is 
noted  that  for  strong  noise  and  interference  the  ARMA  model  resolves  the 
signal  with  much  lower  combined  order  than  the  AR  one. 


The  above  derivations  and  examples  were  based  on  exact  autocorrelation 
functions.  In  practice  estimates  of  sample  correlation  functions  are  used. 
Statistical  properties  of  the  rational  spectral  estimator  introduced  here  are 
at  this  point  unresolved.  Figure  [5]  shows  the  spectra  of  a narrowband 
process  observed  in  white  Gaussian  noise.  The  autocorrelation  samples  were 
estimated  as 


1 

N 


N-i 

E 

t=o 


Vt+i 


(12) 


It  is  noted  that  higher  order  AR  and  ARMA  models  are  needed  for  high  resolu- 
tion estimation  of  the  noisy  signal.  An  obvious  explanation  of  this,  is  the 
fact  that,  for  finite  N,  the  noise-power  dependent,  non-zero  residual  sample 
cross-correlation  between  {n^}  and  {xt}  and  sample  autocorrelation  function 
of  {n^}  prevent  the  total  disentangling  of  the  signal  and  noise  information 
as  for  example  through  the  modified  Yule-Walker  equations.  This  also  brings 
out  the  dependence  cf  the  resolution  of  the  above  techniques  on  the  relative 
signal  to  noise  and  interference  power. 


Conclusions 


Spectral  estimation  of  signals,  contaminated  by  noise  and  interference, 
using  rational  spectral  models  was  discussed.  The  resolution  of  the  auto- 
regressive spectral  estimator  was  related  to  the  change  in  the  model  of  the 
measured  data.  In  such  cases,  high  resolution  spectral  estimates  require 
relatively  high  order  AR(L)  or  an  AR(K)/MA(K)  model  with  K < L/2.  An  ad-hoc 
method  was  introduced  for  estimating  the  ARMA  (rational)  spectrum  in  a 
computationally  efficient  manner,  based  on  reasonably  accurate  estimates  of 
the  correlation  function.  For  relatively  small  sample  sizes,  optimum  but 
computationally  inefficient  ARMA  identification  methods  can  be  used  off-line. 
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Figures 


FIGURF  1.  Radar  power  spectrum  .AT  = .005  msec,  2000  samples. 


FICURE  2.  Calculated  power  spectrum  for  r =exp (-2ir  | k | AT)cos  (2iTx750kAT) 
AT=l/2048.  k 


FIGURE  4 


Calculated  power  spectr 
cos(2irx750kAT)  + 0. 


6 


HR(30)/MBl0) 


FIGURE  5.  Estimated  power  spectrum  for  the  signal  and  noise  of  Figure  3. 
N = 500  samples. 
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STATE -VARIABLE^ TECHNIQUES 
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ABSTRACT 

The  problem  of  spectral  estima- 
tion is  formulated  as  a fundamental 
problem  in  the  detection  of  a random 
process  signal  in  noise.  The 
generalized  likelihood  ratio  test  for 
signals  generated  by  passing  white 
noise  through  a state -variable  linear 
filter  is  computed  and  shown  to  depend 
on  the  energy  in  the  prediction 
residuals  as  generated  by  a Kalman 
filter.  For  an  all -pole  filter  and 
for  large  signal -to-noise  ratio  (SNR), 
it  is  shown  that  the  maximum 
likelihood  spectral  estimator 
corresponds  to  solving  the  same  normal 
equations  as  for  the  Maximum  Entropy 
Method  (MEM) . For  low  SNR  or  for 
other  filter  characterizations  the 
MEM  technique  is  no  longer  optimum. 

In  these  cases  numerical  algorithms 
can  be  used  to  produce  the  maximum 
likelihood  spectral  estimates.  When 
the  signal  generator  consists  of  a 
linear  combination  of  damped  sinusoids 
the  Kalman  filter  realization  leads 
to  an  estimation  algorithm  that  may 
have  the  ability  to  resolve  multiple 
targets  that  are  closely-spaced  in 
frequency  and  contaminated  by- 
additive  background  noise.  The 
essence  of  high -resolution  is  evident 
in  the  Kalman  filter  solution. 

INTRODUCTION 

The  Maximum  Entropy  or  Linear 
Prediction  method  of  spectral  estima- 
tion has  been  widely  used  in  those 
situations  where  high  resolution  is 
required  even  though  the  data  record 


lengths  are  short ^ . Theoretically 
the  method  is  justified  on  the  basis 
that  for  an  all -pole  signal  model  and 
no  background  noise  to  distort  the 
signal  the  MEM  technique  minimizes 
the  energy  in  the  prediction 
residuals! 2] . The  method  is  then 
applied  to  problems  in  which  the 
signals  do  not  fit  the  all -pole  model 
and  are  often  contaminated  by  noise. 

In  this  paper  an  attempt  is  made  to 
solve  the  general  problem  of  optimally 
estimating  the  spectrum  of  a signal 
which  is  observed  in  additive  noise. 

PROBLEM  FORMULATION  AND  SOLUTION 

In  radar  the  fundamental  problem 
is  to  detect  the  presence  of  a signal 
based  on  N samples  of  a noisy  wave- 
form. This  corresponds  to  the 
hypothesis  test 

H : y(n)  = w(n) 

0 (1) 

H^  y(n)  = s(n)  + w(n) 

where  w(n)  and  s(n)  represent  the  n'th 
sample  of  the  noise  and  signal  wave- 
forms respectively.  The  signal  and 
noise  are  assumed  to  be  sample 
functions  of  independent,  zero  mean, 
Gaussian  random  processes.  The  noise 
has  variance  a 2.  If  an  Autoregressive 
Moving  Averagew(ARMA)  model  is  used 
to  characterize  the  signal,  then 

p q 

s(n)'  k=l  *k  S-n'k)=  A V(n‘A) 

(2) 

where  the  driving  noise,  u(n) , is  zero 
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(5a) 


mean,  Gaussian  white  noise  with 
variance  a . If  a parallel 
resonator  Model  is  used  to  character- 
ize the  signal,  then 
M 


l = Nln  (ow  ) 


♦ i I?1™ 

a 2 n=l 
uw 


s(n)  = , s (n) 

v m=l  m 

smW  ■ amlsmtn-1)  * ‘W"'2) 


+ a u(n) 
mo  ^ J 


(3a)  ^j(a)=  Nln (op2) 

1 N 

+ — j L [y(n)  -3(n/n-l)  ] 

(3b) 

where 


(5b) 


where  the  resonator  coefficients 
(a  ,a  ->»a^)  completely  specify  the 
amf>?itGae  .oandwidth  and  frequency 
of  the  n'th  resonance.  Both  of  the 
above  models  admit  to  a state - 
variable  formulation  of  the  following 
form: 


x(n)  * A x(n-l)  + b u(n)  (4a) 

s(n)  = H x(n)  (4b) 

in  which  A and  b are  directly  related 
to  the  ARMA  coefficients  or  the 
resonator  parameters  depending  on 
whether  (2)  or  (3)  is  used  to 
characterize  the  signal  generation 
process . 

If  these  coefficients  are  known, 
then  so  too  is  the  system  transfer 
function  and  hence  the  spectral 
density  of  the  signal.  In  general 
they  are  not  known  and  estimates  must 
be  made  of  them.  An  optimum  estimate 
of  the  power  spectrum  can  be  obtained 
by  using  the  maximum  likelihood  method 
to  estimate  the  unknown  spectral 
coefficients . The  fundamental 
theoretical  problem  is  to  make  use  of 
the  data  set  y(l) , y(2),...,  y(N)  to 
determine  if  a target  is  present  and, 
if  it  is,  to  estimate,  in  an  optimum 
way,  the  spectral  coefficients. 
Schweppe  [3]  has  shown  that  the 
likelihood  functions  for  such  a 
detection-estimation  problem  are 


\ = -In  p[y(l),y(2),...,y(N)/Hi], 

3(n/n-l)  is  the  minimum -mean- squared - 
error  (mmse)  one -step  ahead  prediction 
of  s(n)  based  on  y(n-l) , y(n-2),... 
and  Op2  is  the  prediction  error 
variance  averaged  over  the  ensemble  of 
speech  and  noise  sample  functions. 

If  the  background  noise  variance  is 
also  assumed  to  be  unknown  then  _ 
maximum  likelihood  estimates  of  Oy, 
and  Op2  can  be  found  by  minimizing 
(5a)  and  (5b).  This  leads  to  the 
likelihood  ratio 

N 2 

[y(n)-§(n/n-l)] 

*(a)=  (6) 

nil  >'2fn) 


A signal  is  declared  present  whenever 
8,(a)<X.  Of  course  the  spectral 
parameters  a are  unknown  and  must  also 
be  estimated-  from  the  data.  The 
maximum  likelihood  estimates 
(minimum  variance,  unbiased  for  large 
SNR)  are  obtained  by  minimizing  £(a) , 
which  corresponds  to  minimizing  the 
energy  in  the  prediction  residuals. 
Since  3(n/n-l)  is  the  mmse  one-step 
ahead  prediction  of  s(n)  based  on 
y(n-l) , y(n-2),...,  it  can  be 
generated  by  the  Kalman  filter,  which 
for  the  stationary  case,  is  given 
by  the  following  equations  [4] 
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e(n)  = y(n)  - §(n/n-l) 

(7a) 

§tn/n-l)  = H £(n/n-l) 

(7b) 

x(n/n-l)  = A k(n-l) 

(7c) 

k(n)  = k(n/n-l)  + K e(n) 

(7d) 

where  the  Kalman  gain,  K, 

is  give  by 

1 T 

K = PH1 

— O L 

W 

(8a) 

P » M ^ r MHTHM 

a Z+HMH1 
w 

(8b) 

T T 2 

M = APA1  + bb1  a L 
— u 

(8c) 

ANALYSIS  OF  SPECIAL  CASES 
(a)  All -Pole  Model 


e(n)  = y(n)  - s(n/n-l)  (10a) 

P 

s(n/n-l)  = k|1  ak  §(n-k)  (10b) 

§(n)  = §(n/n-l)  + e(n)  (10c) 

where  s(n)  is  the  mmse  estimate  of 

s(n)  based  on  y(n) , y(n-l) The 

maximum  likelihood  estimates  of  the 
filter  coefficients  are  obtained  by 
minimizing 


i(a1,  a2,...,a  ) = nf1  e2(n)  (11) 

In  general  it  is  not  possible  to 
determine  the  maximum  likelihood 
estimates  analytically.  However,  as 
SNR-x»,  K.-*T  and  as  a result  § (n)»y(n) , 
hence 


In  general  computation  of  the 
Kalman  gains  is  a difficult  task. 

For  the  special  case  of  a second 
order  all -pole  signal  model,  solution 
of  equation  (8)  for  large  SNR  leads 
to  the  gains 

K1  ■ ITT  (9a> 

a-K-fl-K.) 

K2  “ l-a^l-Kj) 


P 

§(n/n-l)=  k2j  ak  y(n-k) 
and 


A(a1,a2,.. . ,ap) 

■ nh  \ 


(12) 


which  can  be  solved  analytically.  In 
fact  the  normal  equations  are 


2 2 P 

where  § = a /a  defines  the  input 

signal-to-noiseratio.  If  §-«o,  it  ^=1  a]{Ry(k-j)=R yCj)  l<j«p  (13) 

follows  that  K.->T  and  K--*-0.  For  higher 

order  systems  attempts  to  solve  where 

equation  (8)  do  not  lead  to  simple 

analytical  expressions  for  the  Kalman  N-k 

gains.  It  can  be  shown,  however,  that  Ry00=  ^=1  y(n)y (n+k) 

for  large  SNR,  K.=§/(l+§),  as  in 

equation  (9a),  while  K^O,  i=2,3,...p.  which  agree  with  those  obtained  for 
Under  these  conditions  the  Kalman  the 

maximum  entropy  method  (MEM)  . As 
filter  equations  for  the  all -pole  a consequence  it  follows  that  the  MEM 

model  can  be  written  as  follows:  spectral  estimator  is  optimum  for  the 

special  case  of  an  all -pole  model 
provided  the  SNR  is  large  enough. 
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For  those  situations  in  which 
non-negligible  noise  levels  exist 
the  MEM  estimates  are  no  longer 
optimum,  however,  the  maximum 
likelihood  estimates  can  be  found  by 
minimizing  (11)  numerically  in 
conjunction  with  the  dynamical 
constraint  equations  (10) . 

(b)  Sum  of  Damped  Sinusoids 

When  the  received  signal  is 
modelled  by  a bank  of  resonators , 
solving  (8)  under  the  large  SNR 
assumption  yields  the  Kalman  gains 
for  the  n'th  resonator.  These  are 


ml 


m 


1 + 


~W~ 

Y. 

m=l 


m 


(15a) 


m2 


a _K  , (1-K  ,) 
ml  mlv  ml' 

1-a  _(1-K  75 
m2^  ml' 


(15b) 


where  § =a^/aw  represents  the  SNR 
in  the  m'tnresonator  channel.  If 
the  state  variables  for  this  system 
are  taken  to  be  x . (n)  = s (n) , 
x ,(n) -s  (n-1) , tfien  them  Kalman 
filter  can  be  shown  to  be  given  by 


M 


e(n)=y(n)  - ^ §m(n/n-l) 

(16a) 

SmCn/n-i)  = *ml(n/n-l) 

(16b) 

XmiCti/H-D^mi^iCn-1) 

(16c) 

+\A2(n_1) 

*m2(n/n-:0  = *ml(n-l) 

(16d) 

*ml ^ =^ml (n/n_  x)  + Kmle(n) 

(16e) 

\2^=Xm2(n/n-^  + Km2e(n) 

(16f) 

The  maximum  likelihood  estimates  are 
obtained  by  minimizing  the  energy  in 
the  residual  sequence,  which  must  be 
done  using  numerical  techniques  even 
for  the  noiseless  case.  The  Kalman 
filter  is  illustrated  in  Fig.  1 from 
which  the  essence  of  high  resolution 
spectral  estimation  can  be  deduced. 

For  one  thing  the  processor  uses 
the  predictions  as  a means  of 
cancelling  the  influence  of  any  one 
sinusoid  on  its  neighbor.  Secondly, 
the  numerical  search  algorithm 
attempts  to  tune  a second  order  band- 
pass filter  to  allow  each  sine  wave 
to  pass  in  such  a way  that  it  is 
optimally  reconstructed  as  s (n) . 
Estimates  of  the  energy  in  eSch 
component  sine  wave  could  then  be 
obtained  by  computing 

E = Z 3 2(n)  (17) 

m n=l  m v 1 K 1 

As  a practical  matter,  the  resonator 
coefficients  are  related  to  the 
frequency  and  bandwidth  of  the  m'th 
resonance  through  the  equations 

a , = 2e  01,11  cos  w 
ml  m 


where  the  normalized  bandwidth  and 
resonant  frequency  are  0^=116  /F  , 
w =2Trfm/Fs  in  which  Bjj,  is  the  s 3dB 
bandwidth  in  hz,  fm  is  the  resonant 
frequency  in  hz  and  F is  the  sampling 
rate  in  hz.  In  many  cases,  estimates 
of  and  f^  can  be  made,  which 
provides  initial  estimates  for  the 
numerical  search  algorithm.  In  fact 
if  reasonable  guesses  for  the 
bandwidth  can  be  obtained,  it  is  then 
necessary  to  optimize  only  over  the 
parameters  aTn-,  which  specify  the 
frequencies  of  the  sine  waves. 


i 
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In  many  cases,  estimates  of  3 
and  f can  he  made,  which  provides  m 
initial  estimates  for  the  numerical 
search  algorithm.  In  fact  if 
reasonable  guesses  for  the  bandwidth 
can  be  obtained,  it  is  then  necessary 
to  optimize  only  over  the  parameters 
a i which  specify  the  frequencies  of 
tne  sine  waves. 

CONCLUSIONS 

The  maximum  likelihood  method 
has  been  applied  to  a large  class  of 
radar  detection  and  estimation 
problems  in  which  the  target  signal 
is  not  known  deterministically  but 
could  be  modelled  as  the  result  of 
passing  white  noise  through  a sfcatc- 
variable  linear  filter.  It  was  shown 
that  optimum  estimates  for  an  all-pole 
signal  process  reduced  to  those 
obtained  by  the  maximum  entropy  method 
provided  the  background  noise  was 
negligible.  When  the  background  noise 
level  is  not  negligible  the  likelihood 
function  can  be  minimized  numerically 
so  that  an  optimum  spectral  estimate 
can  be  obtained. 

When  the  signal  is  characterized 
as  the  sum  of  damped  sinusoids,  which 
is  a better  representation  for  many 
situations  that  arise  in  radar 
practice,  the  maximum  likelihood 
technique  leads  to  an  algorithm  which 
can  be  solved  numerically  for  the 
optimum  spectral  estimates.  When  the 
data  can  be  processed  off-line  it  is 
reasonable  to  first  apply  the  MEM 
technique  to  get  initial  estimates 
for  the  frequencies  and  bandwidths 
of  each  of  the  second  order  Kalman 
filters  and  then  fine  tune  the 
estimates  using  the  numberical  search 
algorithm. 
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Abstract 


A method  is  presented  for  estimating,  from  a finite  number 
of  observations,  the  autocovariance  and  the  spectral  density 
of  a time  series  X.  Central  to  our  approach  is  the  modeling  of 
X as  an  autoregressive  process.  In  this  sense,  our  approach 
is  akin  to  the  maximum  entropy  method  of  Burg  and  the  autore- 
gressive spectral  estimation  method  of  Parzen.  However,  we 
obtain  the  coefficients  of  the  filter  that  models  the  process 
X by  a nonlinear  optimization  procedure  which  appropriately 
penalizes  any  departure  of  the  covariance  of  the  process  which 
drives  X from  the  covariance  of  a white  noise  process.  Sample 
covariances  of  the  process  X are  used  in  the  optimization  pro- 
cedure. Once  the  filter  coefficients  characterizing  the  process 
X are  determined,  the  autocova riaree  of  the  process  X is  estima- 
ted based  on  the  requirement  that  it  satisfy  the  Yule-Walker 
equations;  and  the  spectral  density  of  X is  obtained  directly 
from  the  filter  coefficients  and  the  estimated  covariance  values. 
It  is  believed  that  high  resolution  and  accuracy  in  the  estima- 
tion of  the  spectrum  of  X is  achieved  by  the  method.  ' . 

Introduction 

Since  the  classical  contributions  of  Bartlett  M [2], 
the  developments  on  time  series  spectral  estimation  have  been 
many  and  wide  (See  for  example  13]  , L4J  , L5]  , and  [6]  for  recent 
surveys).  Among  these  developments,  a point  of  view  which  appe- 
ars especially  promising  is  the  one  inwKich  t he  series  X,  whose 
spectrum  is  to  be  estimated,  is  modeled  as  an  autoregressive 
process.  The  maximum  entropy  method  of  Burg  [7] and  the  autore- 
gressive spectral  estimation  method  of  Parzen  £3]  are  based  on 
this  point  of  view. 

*This  work  was  supported  in  part  by  the  AFOSR  Grant  75-2777. 


In  what  follows,  we  propose  a method  for  the  estimation 
of  the  autocovariance  and  spectral  density  of  X,  which  is  also 
based  on  the  modeling  of  X as  an  autoregressive  process. 
However,  we  calculate  the  coefficients  of  the  filter  which 
models  X by  a nonlinear  optimization  procedure  which  forces 
the  samnle  covariance  of  the  process  Y that  drives  X (calcula- 
ted from  the  samples  of  X)to  be  as  close  to  that  of  white  noise 
as  desired  and  appropriate. 

Using  the  filter  coefficients  , calculated  as  mentioned 
above,  the  autocovar iance  of  X is  estimated  by  the  reguirement 
that  the  Yule-Walker  equations  be  satisfied.  The  only  assump- 
tion made  in  this  calculation  is  that  the  covariance  of  X 1 or 
zero  lag  (variance  of  X)  is  egual  to  its  sample  value. 

Finally,  the  spectral  density  of  X is  estimated  from  the 
filter  coefficients  and  the  estimated  covariance  values  using 
a wellknown  formula  (equation  (21)). 

Our  method  differs  from  those  of  Burg  [7!  and  Parzen  [3] 
in  the  sense  that  we  separate  the  issue  of  finding  the  optimum 
filter  from  that  of  estimating  the  covariance  of  the  process  X. 
Comments  on  the  specifics  and  merits  of  the  proposed  approach 
appear  in  the  following  sections. 

Derivation  of  the  Optimum  Filter 

Let  the  time  series  X = jx.,  x?,  •••}  be  zero-mean  wide- 
sense-stationary,  with  summable  and  square-summable  autocova- 
riance U^tT)  and  continuous  spectral  density  f (*•»). 

We  assume  that  X may  be  modeled  as  an  autoregressive  pro- 
cess of  order  p.  We  thus  have 

P 

xt  + ^ Wj  Xt-j  = Yt’  = 1»  *•*  » (1) 

where  the  filter  weights  w. , j =1,  ...,  p,  are  to  be  chosen  so 
as  to  make  the  behavior  of-*Y  * ^ y | as  close  to  that  of  a white 
noise  process  as  possible.  ' 

Through  (1),  the  following  relationship  is  obtained  bet- 
ween the  autocovariances  of  Y and  X: 
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Defining  now  the  vectors 


w = col 

( w! , • 

wp). 

(3) 

r (f)  = 
+ 

col  ( R 

x('«'+l),  ..., 

Rx(T+p) ) , 

(4) 

r_(T)  = 

col  ( R 

V.(T--1)  , ..., 

Rx('T-p)  ), 

(5) 

and 

the  pxp 

matrix 

RY (TO  whose 

ji*'*'  element 

is  RyfT't-j-i) , 

we 

may 

express 

(2)  in 

the  concise 

form 

A 

Ry(T-)  = 

Rx(r> 

+ wT  (r+(r) 

+ r_(T))  + wT 

Rx(r)  w, 

(6) 

where  the  superscript  T denotes  the  transpose. 


To  make  Y approximate  white  noise  behavior,  we  appro- 
priately minimize  Ry  ( T" ) for 0 over  all  w.  Specifically, 
we  select  the  minimization  criterion  * 

p 2 P , r T 

J(w)  = r g(r)  R^CO  = £-fg(T)  \_R  CT)  + Wl(r  Cr)+r  (R* ) ) 

gr=  l t=  l *•  + 

+ wT  Rx(r)  w]2}.  (7) 

(Prom  now  on,  we  will  abbreviate  ZZ.  simply  by  ^ ) . Above, 
g('C')  is  a nonnegotive  even  penalty’1? unction  selected  to 
express  the  extent  to  which  it  is  appropriate  for  Y to 
satisfy  the  white  noise  hypothesis.  There  are  two  sources 


•Note  that  for  g(R-)  =1,  (7)  reduces  to  a least  squares 
criterion. 
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of  error  which  contribute  to  the  broadening  of  the  autocorre- 
lation of  Y at  the  origin.  One  type  of  error  occurs  if  we 
use  too  low  a value  for  the  order  p of  our  model  (1)  of  the 
process  X.  The  other  source  of  error  arises  from  the  fact  that 
we  use  sample  covariances  of  X rather  than  the  true  covariances, 
in  the  minimization  of  (7).  So  if  the  number  N of  samples  of  X 
is  small,  values  of  Ft,  ( T)  slightly  away  from  the  origin  should 
not  bp  penalized  too  heavily. 

The  expression  that  we  suggest  for  g(r)  is 

g(T)  = (Ef)2n,  (8) 

where  £ is  a positive  number  and  n a nonnegative  integer. 
g(T)  is  equal  to  unity  for  T*  = (1/c),  and  for  large  n,  g(T) 
is  nearly  zero  for  r<(l/£)  and  rises  sharply  to  high  values 
as  T increases  from  (l/£).  So  by  appropriately  selecting  the 
parameters  £,  and  n,  one  is  able  to  tune  the  minimization 
criterion  (7)  to  the  model  and  to  the  data. 


Now,  the  gradient  of  (7)  is 
J(w)  = )|f*x ( T ) + wT(r+('T)  + r_('f))  + wTRx(V)wJ 

. |r+(T-)  + r_(T-)  + 2 Rx('t*)  “]} 

-X{g(T)[RX(T)  (r+(T)  + rJ'H)  + ( (r+Cr)+r _(T)) 

. ( r+  (X*)  +r_(<D  )T  + 2 Rx(TO  R^CV)  )w 
+ ( 2wT(r+('T)  + r_(T ) ) RX(T)  w + wTRx(T)w 

.(r+(T)  + r_(T)  ) ) + 2 wT  R^CT)  w RX(T)  w]}.  (9) 


The  above  expression  can  be  conveniently  written  as 

h(w),  (10) 

h(w)  being  a p-vector  with  components  defined  by 
a1 


"w  J(w) 


h^  w) 
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are 


where  the  values  of  the  constants  a1,  b., 

J j •-  j — 

obtained  by  carrying  out  the  summations  with  respect  to 't 
in  (9). 


1 . i 

c jk ' d jkl 


The  following  propositions  are  clear: 

Proposition  1:  If  w*  is  a minimizer  of  J then 

h( w* ) = 0.  (12) 


Proposition  2:  If  in  addition  to  (12),  w*  satisfies 

^h. 

^w.(w'  5 > °»  j - 1*  •••*  P* 


then  w*  is  a local  minimizer  of  J. 


(13) 


Algorithm:  The  above  provides  the  basis  for  obtaining  the 
optimum  filter  weight  vector  w*  by  implementation  of  a standard 
minimization  algorithm  (for  example,  based  on  the  gradient 
method  or  the  Newton-Ranhson  method)  on  a digital  computer  . 

For  this  purpose,  we  use  sample  covariances  for  the  various 
covariance  terms  appearing  in  (7).  Since  the  value  of  p is 
in  general  not  too  high,  the  algorithm  should  be  easy  to  run. 

Estimation  of  the  Autocovariance  and  the  Spectrum 

With  the  optimum  filter  weight  vector  calculated  as  in  the 
preceding  section,  we  proceed  to  estimate  the  autocovariance 
Rx(^)  as  follows. 

We  equate  Rx(0)  to  its  sample  value,  that  is* 

V0)  ■ s *t  2 • <M> 


The  remaining  values  of  Rx(t)  are  obtained  by  requiring 
that  the  Yule— Walker  equations  be  satisfied.  Using  our  previous 
notation,  these  equations  take  the  form 

w*T  r_(T0  = -Rx(r),  T=  1,  ...,  p.  (15) 


Above,  since  Rx(-*t)  » Rx(/T)  and 

•For  the  sake  of  generality,  we 
samples  of  X to  be  different  from,  p. 


the  values  of  the  optimun 

allow  the  number  N of 
if  necessary. 
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filter  weight  vector  w*  and  R„(0)  (according  to  (14))  are 
known,  we  have  exactly  p equations  in  the  p unknowns  Ry(l), 

. ..,  R„(p),  which  constitute  the  components  of  the  vector 
r+(0).  So  they  can  be  solved  for  r+(C). 

For  example,  in  the  case  in  which  p = 4,  we  may  express 
(15)  in  terms  of  a single  matrix  equation  in  the  form 


where 


W r+(0)  ~ q, 


( l+w£ ) 

w • 

3 

w • 

w4 

0 

( w^+w^ ) 

( 1+w* ) 

0 

0 

(w*+w*  ) 

W*1 

1 

0 

W3 

w* 

wi 

1 

q = - R^  ( 0 ) w*  . 


Hence,  r+(0)  is  obtained  by  inverting  (16): 


r+(0)  * W-1q. 


To  get  the  spectrum,  first  calculate  R^O)  from  (6) 
Py(0)  = Rx(0)  + 2w*Tr+(0)  + w*TRx(0)w*, 


where  we  have  used  the  fact  that  r (0)  = r_(0).  Note  also 
that  all  the  quantities  in  the  rigfit  side  of  (20)  have 
been  calculated. 

The  spectral  density  is  finally  obtained  in  terms  of 
the  above  quantities  by  the  formula 


f (u>)  = 


Ry<0) 

P 

k-1  K 


-iuk  I 


Conclusion 


A method  has  been  described  for  the  estimation  of  the 
auto  covariance  and  the  spectral  density  of  a stationary 
time  series.  The  proposed  technique  is  a variant  of  the  methods 
of  Burg  [7]  and  Parzen[3l.  The  proposed  framework  allows  one 
to  incorporate  realistic  constraints  in  the  estimation  procedure. 


1. 

2. 

3. 
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Abstract 


Under  certain  conditions.  Burg's  maximum  entropy  spectra 
of  real  signals  in  the  presence  of  additive  noise  show  either 
spontaneous  line  splitting  (low  noise  levels)  or  appreciable 
frequency  shifting  (at  moderate  noise  levels).  This  difficulty 
arises  because  an  unnecessary  constraint  is  imposed  during 
the  minimization  of  the  mean  error  power.  A similar  problem 
arises  when  the  Burg  technique  is  applied  to  complex  signals. 
This  paper  presents  a solution  to  this  latter  problem. 

Introduct ion 


In  a recent  paper,  Fougere  et  al  [1],  have  shown  that 
under  certain  conditions  the  maximum  entropy  method  using 
Burg's  prediction  error  coefficients  produces  power  spectra 
which  display  spurious  line  splitting  in  the  presence  of  very 
low  noise.  In  a second  paper  hereinafter  called  Paper  II, 
Fougere  [2]  showed  that  this  type  of  splitting  occurred  only 
if  the  noise  level  were  sufficiently  low;  when  the  noise  level 
is  gradually  increased  the  spectrum  is  broadened  and  the 
multiple  peaks  coalesce  into  a single  peak  shifted  substantially 
away  from  the  correct  value.  In  Paper  II,  Fougere  presented 
a solution  to  that  problem  and  showed  that  using  that  solution, 
splitting  was  cured  in  the  low  noise  case  and  shifting  was 
reduced  considerably  in  the  moderate  noise  cases.  These  two 
papers  treated  only  the  case  of  real  input  time  series,  and 
therefore  real  prediction  error  coefficients. 

For  general  information  on  the  maximum  entropy  technique 
see  the  PhD  thesis  by  Burg  [3]  and  papers  by  Smylie  et  al  [4], 
Ulrych  and  Bishop  [5],  Ulrych  and  Clayton  [6],  and  Radoski 
et  al  [ 7 ] . 

The  present  paper  extends  these  results  to  the  case  of  a 
complex  time  series  requiring  complex  prediction  error  coeffi- 
cients. The  first  observations  of  spontaneous  splitting  with 
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a complex  signal  in  the  presence  of  very  low  noise  were  by 
R.W.  Herring  (private  communication).  This  paper  will  follow 
closely  the  structure  given  in  the  "Detailed  Mathematics" 
section  of  paper  II.  Each  of  the  equations  in  paper  II  will  be 
written  in  the  appropriate  complex  form.  The  equation  numbers 
will  be  the  same. 

Detailed  Mathematics 

We  are  given  an  n-point  sample  (x^,  x 2>  . ..,  xn)  of  complex 

numbers  x^  , measured  at  equally  spaced  values  of  a single  real 

independent  variable,  usually  considered  to  be  time.  Define 
an  (m+1)  point  prediction  error  filter  (PEF) 

(1,  g g ....  g ) where  each  g..  is  a complex  variable, 
such  that  the  k'th  prediction  errors  are: 

m 

Elk  ,^xk+m-i  ®mi 

i = 0 


= Z x.  . g . 

2k  . _ k+i  mi 

i = 0 

k = 1,  2,  3,  ...,  i-m 

★ 

where  g . is  the  complex  conjugate  of  g . , g = 1 , and  e , and 
mi  mi  mo  ik 

e2k  are  the  forward  and  backward  prediction  errors,  respectively. 

Now  the  mean  square  prediction  error,  or  mean  error  power, 
in  both  time  directions  is: 


P 

m 


2 n -m 

0.5  (n-.r1  I E ssk%k 
s= 1 k= 1 


(2) 


If  the  PEF's  (with  leading  "1"  suppressed)  of  all  orders 
1,  2,  ....  m are  gathered  in  one  complex  matrix  , we  may 

write : 
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gml  gm2  ' ‘ ’ gmm 
The  generalization  of  the  Levinson  Algorithm  is  given  by 

gjk  " g j - 1 , k + gjj  gj-l,j-k 

This  simple,  two  term  formula  allows  the  off  diagonal  elements 
of  the  jth  row  of  Gm  to  be  determined  wherever  the  diagonal 
elements  { g j ^ , j = l,m}  are  known. 

Burg  has  shown  that  if  these  diagonal  elements  (also  called 
reflection  coefficients)  all  lie  in  the  range  | g ^ . | < 1 , then  the 

PEF  is  minimum  phase,  that  is  its  Z transform  has  all  its  zeroes 
outside  the  unit  circle. 

In  order  to  enforce  this  condition  we  set 


g..  = U sin  0.  e^-* 
KJ.l  J 


(5) 


where  0^  and  <|>j  are  any  real  angles  and  U is  a positive  constant 

slightly  less  than  unity.  The  discussion  in  paper  II  on  the 
significance  of  U,  can  be  carried  over  unchanged  to  the  present 
case  of  complex  input  data.  Briefly,  U is  adjusted  so  that  all 
of  the  roots  of  the  Z transform  of  the  PEF  all  lie  outside  the 
unit  circle  and  none  lie  oil  it. 

We  now  follow  a method  used  by  Cain  et  al  [8];  variations 
in  0 . and  d>  ■ are  written 

i ' i 


0 . = 0°  + A0  . 
J 1 J 


. o 

) . = 4>  • 
J J 


A<p  . 
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(6) 


Now  expand  the  prediction  errors 


in  a Taylor  series  about 


0° 
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and  <J>j  and  retain  only  the  first  two  orders: 

m /9e  , 9e  , 

o sk.Q  sk  . , 

e , = e , + E A0  . + -r-r — A<p . 

sk  sk  .,199.  l 9$.  ] 

J = 1 \ J J 

/ 

Substitute  (7)  into  (2)  to  get: 


2 n-m  m / 9 e 

P = 0 . 5 (n-m)  ~ * E E e°,  + E ( 

m , , , sk  , V 30 

s= 1 k= 1 j = j 


A0.  + ^X—  A4>- 
J 3<t>,  j 


Set  9P  /9A0  = 0 to  find  the  minimum  error  power  and  then  re- 

m a r 

arrange  the  resulting  equations  to  get: 


m 2 n-m  T . 9e  , 

7,  £,  „£,  Uxx 

j=ls=lk=l  \ j a 


8£sk  9Esk  3esk  9esk\ 

do  . do  do  . du  / J 

j a j a / 


/9Sk  9esk  9esk  9esk\ 

+ ( w~  90  + 90~  To  J A4>j 

\ rj  a rj  a / J 


l n;m  ( o 9esk  o*  9esk 

' Z.  ,E.  (£sk  90  + Gsk  90 

s= 1 k = 1 V a a 


There  are  three  expression  in  parentheses.  Note  that  each 
has  the  form  A + A*  = 2 Re(A}.  Thus  the  equation  becomes 


m 2 n-m  [~  / 9e 

E E E Re  U 

j=l  s=l  k=l  L V " j 


9esk  9esk \ AO, 
an  an  I J 


+ Re 


(ft  ft)  1 


2 n-m  / 

= - E E Re  t 
s=l  k = 1 \ 


9e 

o sk 
sk  90 


Similarly  when  we  set  9Pm/9A4>a  = 0 we  get 
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m 

E 

j = l 


where 


2 n -m  r 
E E : 

>=  1 k=  1 L 


Re 


3Esk  3Esk 

30 . 3<() 

j a 


+ Re 


°esk  sk  \ 

a*. 

9<t>a  I 1 J 


in  both  (9a)  and  (9b) 


2 n -m 
= - E E 
s = 1 k=  1 


a = 1,  2, 


(9b) 


The  2m  equations  (9)are  now  1 inear  in  the  corrections  A0j 
and  A<)k  and  can  therefore  be  solved  by  standard  matrix  methods. 


The  corrections  are  then  substituted  into  (6)  and  the  process 
is  repeated  until  the  corrections  A0.  and  A$.j  become  suffi- 
ciently sma  1 1 . J 


In  order  to  find  the  derivatives:  3e  , /30.,  3e  , /30. 

^ s K j s k'  j 

3£s^/3<J)j  and  3es^/3(J)^>  we  differentiate  (1)  as  follows: 


9Elk 

m 

9g  • 

mi 

3En 

m 

3g  • 
mi 

30  . 

E x. 

. , k+m- 

i = l 

i 30  . 

’ 3 <J>  . 

3 

£ x,  . 

, k + m 
i = l 

-i  3 0 . 

3 

3 

* 

J 

(10) 

3e2k 

m 

9g  . 

mi 

9£2k 

m 

3g*  • . 
mi 

30  . 

.Z,xk+i 

30  . 

’ 3$ . 

E x,  , 
i=i  k+1 

3 

3 

i=l 

3 

3 

and  four  equations  resulting  from  (10)  by  complex  conjugation, 
for  example 


3e 


lk 


9g 


mi 


30  . 
J 


. , k+m- i 30 . 

i=l  J 


Next  we  rewrite  eq.  (5)  and  its  complex  conjugate 

g = U sin  0e^ 
g*  = U sin  0e  ^ 


(A) 


where  we  have  temporarily  dropped  the  subscripts  for  conveni- 
ence. Equations  (A)  may  be  inverted  to  yield: 
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0 = 6(g,g  ) 

<t>  = 4>  C g » 8 " ) 


(B) 


If  we  treat  g and  g as  independent  variables,  we  can  express 
the  partial  derivatives  with  respect  to  0 and  4>  as  follows: 


3 _ 

3g 

3 

3g* 

3 

30 

30 

3g 

30 

3g  ^ 

3 . 

3g 

3 

_ 4. 

9g* 

3 

34> 

34> 

3g 

34> 

3g  * 

i (j) 

Substituting  e 

= cos  4> 

+ i 

sin 

4>  in 

(A)  and 

performing 

differentiation 

indicated 

i n 

(C) 

we  arrive  at: 

3 

I = U cos 

0 [cos  4> 

°g 

3 

+ 3g 

t)  + 

i sin  4> 

(i_  . 

3g 

9g")] 

(C) 


= U sin  0 [ -sin 


(i-  • 


3g  9g 


r)  + i COS 


E>  n 

(3g  W"’1 


(D) 


Write 


Then 


+ 

DT 

J 


9g  . . 9g* • 

JJ 


CE) 


- = U cos  0 . (cos  4>  dT  + i sin  <p  . D.) 
B0j  J ] J J J 

tJt—  = U sin  0.(-sin  0 . ot  + i cos  4>  . D.) 
3<J>j  J 3 3 J J 


(11) 


Note  that  because  g^  and  g^.  are  independent  variables, 

j / 3g?  . = 0 and  3g^/3g.^  = 0,  for  any  values  of  i and  j. 


We  now  apply  (E)  to  the  Levinson  Algorithm  (4).  The  result 


l s 


Dj  gik  Dj  gi-l,k  + gi-l,i-k  ^ i j + gii  Dj  gi-l,i-k 


(12) 
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Where  the  middle  term  arises  because 


D1  gii  ' (agjj  * sgjj  ) 


g . . = 6 . . 

6n  1 j 


(F) 


If  we  take  the  complex  conjugate  of  equation  (E)  we  get: 


(D*)*  = ± ^ = ±Dt 

y j 


(G) 


J J 


JJ 


Therefore,  anytime  we  need  derivatives  of  any  terms  in  g we  can 
write 


D]  gkm  = ±(Dj  ^km3* 


(H) 


Finally,  the  derivatives  with  respect  to  0.  and  are  deter 
mined  by  substituting  (12)  into  (11). 


J 


J 


The  gradient  of  with  respect  to  the  independent  vari- 
ables 0,.  and  $ ..  is  written: 

K K 


m / 9P  9P  ^ 

V0,4»  Pm  = M InT  §a  + TT 

’ a=l\  a a 


(13) 


where  0 and  <t>  are  unit  vectors, 
a a 

Applying  (13)  to  (2)  yields: 

9P  2 n-m  9e*. 

^ = 0.5(n-m)-1  I S e#k 

a s*l  k=l  a 


9e 


♦ e 


sk 


sk  90 


a 


= (n-m) 


, 2 n-m  / 9e  . » 

zi  kEi  Re(£sk  ^r) 

s=  1 k=  1 \ a / 


Thus, 


9P 

r 

90 


= -(n-m)  1 x (right  side  of  (9a)) 


(14a) 
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Similarly, 


3P 

yjjp-  = -(n-m)  ^ (right  side  of  (9b))  (14b) 

a 

This  completes  the  formal  derivation.  Clearly  these 
results  must  be  programmed  for  a computer  before  we  can  test  the 
method.  When  such  a program  has  been  written  and  checked  out, 
it  will  be  made  available  on  request  to  seriously  interested 
sc ient i st  s . 
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1.  Introduction 


An  important  problem  in  many  applications  is  the  determination  of  the 
frequency  components  of  a signal 


i=l 


(1) 


in  terms  of  the  segment 


wjd)  = 


f(t)+n(t) 

0 


It | <T 
It  I > T 


(2) 


of  f (t)  containing  the  noise  component  n(t).  The  signal  f(t)  is  not  known  for 
every  t for  a variety  of  reasons: 

The  signal  f(t)  can  be  written  as  a sum  of  exponentials  for  a limited 
time  only  (Voice;  non- stationary  processes). 

The  available  time  of  observation  is  limited  (sun  spots;  weather  trends^ 

Measurements  are  limited  by  instrument  constraints  (Michelson  inter- 
ferometer; band-limited  channels). 

The  unknown  frequencies  uu.  and  coefficients  c.  can  be  determined 
simply  with  ordinary  Fourier  transforms  if  the  time  of  observation  2T  is 
large  compared  to  all  the  periods  T.  = 2tt/(Uj  . This  is  not,  however,  the 
case  if  T is  of  the  order  T^  particularly  if  the  noise  component  of  Wj(t)  is 
not  negligible.  In  this  paper,  we  present  a method  which,  as  we  hope  to 
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show,  is  reliable  even  in  such  extreme  cases. 

The  method  involves  only  FFT  and  it  is  based  on  earlier  results 
dealing  with  the  problem  of  extrapolating  band-limited  functions  Cl,  2]  . 
We  review  for  easy  reference  the  relevant  parts  of  these  results. 

2.  Extrapolation  of  band-limited  functions 

Consider  a function  f(t)  with  Fourier  transform  F(uu)  such  that 

F(uu)  =0  |ai | > a (3) 

We  form  the  function 


Wj(t)  = 


It  I <T 
It  I >T 


(4) 


obtained  by  truncating  f(t)  as  in  Fig.l.  We  shall  determine  f(t)  in  terms  of 
Wj(t)  by  numerical  iteration. 

First  step.  We  compute  the  Fourier  transform  W.(uu)  of  w^t),  form  the 
function  1 


Fjfau) 


W (uu)  M<o 
0 |uu|  > o 


(5) 


Compute  the  inverse  transform  f^(t)  of  Fj(uu),  and  form  the  function 

w1(t)=f(t)  |t|  < T 


w2(t)  = 


f,(t)  |t|>T 


(6) 


and  its  Fourier  transform  W2(uj). 

This  completes  the  first  step  of  the  iteration  (Fig.l). 
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nth  step.  We  form  the  function 


M < a 
|w|  >a 


F 

n 


(ID) 


W (UJ) 
n ' 

I 

>0 


(7) 


where  Wn(uu)  is  the  function  obtained  at  the  end  of  the  preceding  step.  We 
compute  the  inverse  transform  fn(t)  of  F^uu),  form  the  function 


(8) 


and  compute  its  Fourier  transform  W^+^(uu)  • 

If  f(t)  is  approximated  by  f (t),  the  resulting  mean-square  error  is 

given  by 

• 2 o 2 

En  = I f(t)'fn(t)j  dt  = JV(w)-Fn(u*|  duu  (9) 

-•  -a 


We  maintain  that  this  error  decreases  twice  at  each  iteration  step. 
Indeed, 


2 2 

En  = / [w-V'l]  * + j[£'t»-y‘>]  d* 
|t  I < T |t  I > T 


But  [see  (8)  and  (7)] 


J [«*>-*„<*>]  dt  = TsJ  lFW-Wn+1Wl  d» 

|t|>T  — -• 

2 a 2 
= 7nJ  |FW-Wn+I(«,)|  d»+^  J|FW-Fn+lWl  d» 

M>° 
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Hence, 


E 


n"En+l 


f(t)-fn(t)]‘ 


dt  + 


|t  I <T 


J lF«»>-Wn+i(«>)l  dw 

|uj|>a 


(10) 


In  Q]  and  [2]  we  show  that  f (t ) — . f(t)  as  n->®  . This  is  not  true  if  the 
given  segment  Wj(t)  of  f(t)  is  noisy1  as  in  (2).  In  this  case,  a satisfactory 
estimate  of  f(t)  can  be  found  by  early  termination  of  the  iteration.  [2] 

Note.  From  (10)  it  follows  that  the  mean- square  error  E_  is  a monoton 
decreasing  function  and  since  it  is  positive,  it  tends  to  a limit.  This  does  not 
prove  the  convergence  of  (9)  because  the  limit  need  not  be  zero.  It  shows, 
however,  that 


Hence, 


E -E  0 n-*® 

n n+1 


J Lf(t)-fn(t) 
It  |<T 


2 

dt 


0 


n — * oo 


(id 


Although  the  functions  f(t)  and  f„(t)  are  band-limited,  (11)  does  not  imply 
that  f(t)  -»fn(t)  because  there  is  no  lower  bound  on  the  energy  concentration 
of  band-limited  functions  in  a finite  interval  tl,  3]  . For  example,  the 
prolated  spheroidal  functions  cpQ(t)  are  band-limited,  their  energy  equals 
one  but  their  energy  concentration  in  the  interval  (-T.T)  tends  to  zero  as 
n-*«*.  This  is  the  case  because  the  eigenvalues  of  the  underlying 
integral  equation  tend  to  zero  as  n-.®  . 

We  mention  without  elaboration  that,  in  the  discrete  version  of  the 
problem,  the  convergence  of  the  iteration  can  be  deduced  from  (11)  under 
suitable  conditions.  The  reason  is  that  the  corresponding  eigenvalues  are 
finitely  many,  therefore,  they  have  a positive  minimum. 


3.  Adaptive  extrapolation 


The  preceding  method  was  based  on  the  assumption  that  the  unknown 
function  f (t)  is  band-limited.  This  information  was  used  to  reduce  the 
error  in  the  estimation  of  f(t)  twice  at  each  iteration  step.  The  speed  of 
iteration  can  be  increased  and  the  effects  pf  noise  can  be  reduced  if 
additional  a priori  information  about  f(t)  is  available.  Suppose,  for 
example,  that  the  size  of  the  band  of  F(uu)  is  known  but  its  precise  location 
is  unknown.  We  then  choose  a constant  o sufficiently  large  for  F(uu)  to 
vanish  outside  the  integral  (-a,  a)  and  proceed  as  in  Sec.  2.  As  the 
iteration  progresses,  the  form  of  Wn(uu)  suggests  appropriate  reduction  of 
the  assumed  band  of  f(t). 

The  adaptive  extrapolation  method  is  particularly  effective  if  f(t)  is  a 
sum  of  exponentials  as  in  (1).  In  this  case,  F(uu)  consists  of  impulses 
(lines)  as  in  Fig.  2: 


m 


F(uu)  = 2rr  ) c.  6 (uu-uu. ) 


(12) 


i=l 


and  our  problem  is  to  determine  their  locations  uu.  and nplitudes  c.  in 

terms  of  the  known  segment  wn(t)  of  f(t). 

To  solve  this  problem,  we  select  a constant  a larger  than  the  largest 
possible  value  of  uu.  and  we  proceed  with  the  iteration  until  W (uu)  takes 
significant  values  only  in  a subset  Bn  of  the  band  (-o,  o)  of  f(t)  (Fig.  3). 
This  suggests  that  the  unknown  frequencies  are  in  B . When  this  is 
observed,  the  function  Fhfuu)  of  the  nth  iteration  step  is  obtained  from  the 
following  modification  of  (7): 


F (uu) 

n 


W (uu) 

n ' 

0 


uue  B 

n 


w eB 

n 


(13) 


(Flg^3)In  the  above,  B is  the  set  of  points  such  that  W (uj)  exceeds  a 

threshold  level  e n 

n 


W (uu) 
n 


> e 

n 


< e 

n 


uue  B 

n 

uueg 

n 


(14) 
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and  Bn  its  complement  . The  process  is  repeated  until  W (uu)  approach 
the  unknown  spectrum.  This  can  be  checked  by  comparing  the  inverse 
wn(t)  of  Wn(uu)  with  the  known  segment  of  f(t). 

Notes.  As  the  following  examples  show,  the  unknown  frequencies  can  be 
found  even  if  the  data  are  noisy  and  the  constant  T is  smaller  than  the 
smallest  period  T^  . 

The  choice  of  the  threshold  level  e is  dictated  by  two  conflicting 
factors:  For  a speedy  convergence  ancf1  reduction  of  the  noise  component, 
e must  be  large.  It  must  be  sufficiently  small  so  that  all  frequency 
components  of  F(uu)  are  in  the  set  B . Thus  e is  small  at  first  and  it 
increases  as  the  iteration  progresses. 

The  accuracy  of  the  method  depends  on  the  number  m of  the  unknown 
components  and  their  relative  locations  and  amplitudes.  If  some 
components  are  small  compared  to  the  maximum  c.,  it  is  possible  that  they 
could  be  lost.  However,  if  the  noise  is  sufficiently  small,  they  can  be 
recovered  by  substracting  the  significant  components  and  repeating  the 
iteration. 

A priori  knowledge  of  the  number  m of  the  unknown  frequencies  is 
useful  but  not  essential. 

If,  at  the  nth  iteration  step,  all  frequency  components  of  f(t)  are  in  the 
set  B , then  the  resulting  mean- square  error  reduction  is  given  by  (10) 
mutatis  mutandis. 

4.  Illustrations 


We  conclude  with  a digital  implementation  of  the  above  method.  The 
computations  were  performed  with  a PDP  11  minicomputer  (single  precision) 
and  the  FFT  size  was  N=  256.  The  known  segment  w^(t)  of  f(t)  contains  the 
first  30  points.  In  figure  4 the  unknown  signal  consists  of  3 cosine  waves. 
Determination  of  their  amplitudes  and  frequencies  is  not  apparent  from  the 
Fourier  transform  Wj(uu),  figure  4b,  of  the  given  segment.  The  resolution 
is  improved  considerably  within  a few  steps  of  the  iteration.  In  the  12th 
step  the  three  cosine  terms  have  been  revealed,  figure  4c,  and  in  the  63rd 
step  complete  estimation  of  frequencies  and  amplitudes  have  been  achieved, 
(figure  4d). 

In  the  next  example,  figure  5,  the  same  signal  is  used  but  we  added 
10%  white  noise  uniformly  distributed.  The  signal  is  again  recovered 
completely  in  100  steps.  In  figure  6 the  unknown  signal  consisting  of  two 
exponentials  has  been  corrupted  by  20%  white  noise.  To  recover  the  signal 
the  iteration  needs  30  steps. 


Thus,  as  we  see  from  these  illustrations  the  unknown  frequencies  can 
be  found  even  if  the  data  are  noisy  and  the  constant  T is  smaller  than  the 
smallest  period  in  f(t). 
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FIGURE  1,  (a)  The  unknown  signal  f(t)  and  its  Fourier  transform  F(uu) 

(b)  First  iteration  step  starting  with  known  segment  w^(t). 
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FIGURE  5 
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. (a)  F(u>): 

(b)  W^ou): 

(c) ,  (d): 


The  Fourier  transform  of  the  unknown  signal  f(t). 
The  Fourier  transform  of  the  given  segment  Wj(t). 
The  result  of  the  12th  and  100th  iteration. 
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FIGURE  6 


(a)  F(oj):  Unknown  signal  consisting  of  two  impulses 

contaminated  by  noise. 

(b)  Wj(uj):  Fourier  transform  of  the  known  segment  w^(t), 

(c) ,(d):  The  result  of  the  4th  and  30th  iteration. 
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Abstract 

We  have  recently  developed  the  mathematical  tools  to  treat  a certain  class  of 
non-stationary  processes  that  are  of  great  interest.  Most  current  spectral 
estimation  techniques  make  implicitly  the  somfewhat  tenuous  assumption  that  the 
observed  processes  are  stationary;  however,  the  most  frequent  case  of  sine  waves  in 
(stationary)  white  noise  is  really  nonstationary.  Current  approaches  typically 
randomize  the  phase  in  order  to  achieve  at  least  a stationary  ensemble.  Although 
the  non-randomized  processes  are  nonstationary,  they  possess  close  to  stationarlty 
features.  One  of  them  is  the  fact  that  their  covariances  exhibit  a finite  so-called 
"displacement  rank"  a.  Matrices  with  low  displacement  rank  also  appear  in  the 
problem  of  fitting  autoregressive  (AR)  models  to  sample  covariances.  The 

significance  of  this  low  displacement  rank  is  that  it  takes  0(n2a)  instead  of  0(n3) 
operations  to  solve  normal  equations  associated  with  fitting  n-th  order  AR  models 
to  such  covariances. 
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Contract  /V00014-75-C-0601;  National  Science  Foundation  under  Contract  NSF  Eng 75-16952; 
and  by  ARPA  through  the  use  of  the  Stanford  Artificial  Intelligence  Laboratory  facilities. 
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I.  Introduction 


Spectral  estimation  has  in  the  past  received  a lot  of  attention;  however  in  recent 
years,  new  approaches  to  spectral  analysis  have  been  developed  and  applied  with 
considerable  success  in  many  areas,  such  as  speech  rocessing,  geophysical  data 
analysis,  as  well  as  sonar  and  radar  signal  enhancement.  The  bases  for  such 
developments  have  been  available  for  some  time,  for  instance,  in  statistics  through 
the  work  of  Parzen  [Par]  and  in  geophysics  the  work  of  Burg  [Burl].  Burg's 
motivation  was  to  choose  the  spectrum  that  corresponds  to  the  most  random  or  the 
most  unpredictable  time  series  whose  covariance  function  agrees  with  the  known 
values,  say  { /?0,  ...  , R^},  thus  leading  to  the  name  maximum  entropy 

method  (MEM).  This  procedure  has  been  shown  (see  e.g.  Van  Den  Bos  [VDB]  ) to  be 
equivalent  to  computing  the  spectrum  via  the  least-squares  fitting  of  an 
autoregressive  (AR)  ( or  all-pole ) model  to  the  given  covariance,  which  was  the 
procedure  suggested  by  Parz:n.  An  AR  process  is  one  in  which  the  best 
least-squares  prediction  is  given  by  a weighted  sum  of  previous  values. 

The  problem  of  fitting  an  AR  model  to  a given  covariance  function  leads  to  the 
solution  of  a set  of  equations  known  as  the  Yule-Walker  equations.  These 
equations  can  be  solved  in  an  efficient  recursive  way  by  an  algorithm  first 
presented  by  Levinson  (1942)  and  Durbin  (1960)  for  the  scalar  case  and 'then 
extended  to  the  vector  case  by  Whittle  (1963),  Wiggins  and  Robinson  (1965),  and 
Burg  (1967)  ( see  the  references  in  [WR],  [K-S74]  , [Mol]  ).  We  refer  to  it  as  the 
LWR  algorithm.  It  recursively  fits  AR  schemes  of  increasing  orders.  At  each  step, 
the  forward  and  backward  one-step  linear  predictors  (say  of  order  p ) are  found  as  a 
linear  combination  of  the  predictors  of  order  p- 1. 

In  the  scalar  case,  it  is  known  ( see  e.g.[Bur2],  [MVK] ) that  the  LWR  algorithm 
produces  a sequence  of  reflection  coefficients,  having  magnitude  less  than  or 
equal  to  one,  which  have  a one-to-one  correspondence  with  the  given  covariance. 
Therefore,  these  numbers  can  be  used  to  parametrize  the  spectrum  of  a process 
directly.  In  addition,  the  reflection  coefficients  determine,  through  the  LWR 
algorithm,  the  parameters  of  the  fitted  AR  scheme,  or  in  other  words  the  time 
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domain  model.  Recently  we  have  shown  in  [MVI.K]  that  for  the  vector  case,  we 
can  also  obtain  a sequence  of  matrix  reflection  coefficients,  which  have  a 
one-to-one  correspondence  with  the  given  matrix  covariance  functions.  This  is 
achieved  by  using  a suitably  normalized  forms  of  the  LWR  algorithm,  a simple  case 
is  given  in  Section  II. 

In  practice,  we  do  not  have  available  the  exact  values  of  the  covariance  function 
but  instead  we  are  only  given  a finite  segment  of  a time  series  { yt  }.  A classical 
method  of  fitting  an  AR  scheme  to  the  data  is  then  to  estimate  the  lagged 
covariances  by,  say, 

J N-k 

^ k “ N-k  2 yt*k 
” »=i 

and  then  to  use  them  in  the  Yule-Walker  equation.  Burg's  technique  is  to  directly 
estimate  the  reflection  coefficients  from  the  data,  ignoring  the  covariance 
function.  These  estimates  are  obtained  by  minimizing  the  sum  of  the  squares  of  the 
forward  and  backward  one-step  prediction  errors  (or  innovations)  [Burl],  [Bur2]. 

In  most  of  the  classical  methods  as  well  as  Parzen's  (and  Burg's)  technique,  an 
implicit  assumption  is  made  that  the  observed  data  are  samples  of  stationary 
processes.  For  instance  in  Parzen's  (and  Burg's)  method,  it  is  assumed  that  the  AR 
model  that  is  to  be  fitted  to  the  data  represents  a stationary  process.  However,  in 
most  practical  cases,  the  observed  date  are  not  necessarily  AR  and  often  also  not 
very  stationary.  More  generally,  one  would  like  to  fit  a model  that  reflects  as 
closely  as  possible  the  underlying  physics  of  the  problem.  For  example,  if  (white) 
measurement  noise  is  added  to  the  output  of  an  AR  model  driven  by  some  (white) 
input  process,  a model  of  the  observations  that  is  driven  by  a single  (white)  noise, 
the  so-called  "innovations  model",  will  have  a moving  average  part  and  is  therefore 
a more  general  autoregressive  moving  average  (ARMA)  model.  Another  example  is 
the  case  where  a sine  wave  with  fixed  initial  phase  is  represented  as  a undriven 
second  order  AR  process  with  fixed  initial  conditions.  The  output  of  this  process  is 
clearly  non-stationary.  In  linear  problems  the  "natural"  functions  are  complex 
exponentials,  leading  to  rational  spectra  and  thus  to  ARMA  type  models.  The 
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common  case  of  "spectral  lines"  in  white  noise  unfortunately  is  strictly  speaking 
non-stationary  and  not  AR.  Thus  in  general  we  need  also  to  include  spectral  zeros, 
or  from  a process  point  of  view  include  the  so-called  MA  (moving  average)  models. 
The  presence  of  zeros  leads  unfortunately  to  difficulties,  as  in  many  other  areas 
such  as  communications  and  control.  It  is  possible  however  to  embed  the  ARMA 
problem  in  a (multichannel)  AR  model  as  shown  in  [Mo2],  We  will  concentrate 
here  on  the  simpler  AR  case. 

In  recent  work  [FMKL],  [FKM],  [Ka]  and  [MK],  we  have  shown  that  the  above 
cases  fall  into  a class  of  processes  which  we  called  a -stationary..  These  proc esse 
can  be  characterized  by  an  index  of  the  "distance  from  stationarity"  of  the  pr>  cess 
It  turns  out  that  matrices  with  low  displacement  rank  also  appear  in  the  problem  > f 
fitting  autoregressive  (AR)  models  to  sample  covariances.  The  significance  of  this 
low  displacement  rank  is  that  it  takes  0(n2a)  instead  of  O(n^)  operations  to  solve 
normal  equations  associated  with  fitting  n-th  order  AR  models  to  such  covariances. 

In  Section  11  we  will  illustrate  the  basic  ideas  of  the  (normalized)  LWR 
algorithm,  and  in  Section  III  we  discuss  some  extensions  to  non-stationary 
processes  as  well  as  methods  to  treat  the  more  general  ARMA  models. 


II.  The  Normalized  LWR  Algorithm 

Suppose  we  are  given  the  m xm  matrices 

R„  * E { h h.n  ) ■ M** 

where  {y,}  is  an  m-vector  stationary  random  process,  so  that  R_n  - RnJ.  Then  the 
so-called  maximum  entropy  extension  of  the  sequence  { Rn,  |n|  s N } is  defined  by 
the  expressions 
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Rb)  - £ *„  z"‘ 

II  *-00 

■ ^(z)  v T(z 

= bN-](z)  R„r  Bn-t(z-1)  (1) 


where  Ap^z)  and  fl^z)  are  transfer  functions  of  the  so-called  forward  and  backward 
prediction  filters,  and  RN(  , R Nr  are  the  respective  prediction-error  (or 
innovation)  variances.  These  quantities  are  defined  by  the  equations  ( see  e.g.. 


[WR],  [K-S74]  ). 
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(2) 


Ap/(z)  = I + A^y  jZ  ^ + ...  + Ajy^jyZ  N (3) 

B/V<z)  - BN,N  + BN,N-  lz~'  + • • • + (4) 

Note  that  (2)  is  just  the  (Yule-Walker)  equation  obtained  in  minimizing 


£{  <T/V,«  */V,t  1 and  E!  rT/V,t  r/V,t  I where 

(N,t  “ + ^N.l^i-l  + • • • + AN,Nyt-N 

rN,t  " fl/V,N  + • • • + BN,\yi-N*l  + ?t-N 
are  respectively  the  forward  and  backward  prediction  errors.  These  equations  can 


be  solved  in  an  efficient  recursive  manner  by  using  the  LWR  algorithm  ( [WR], 
[K-S74]  ).  In  [MVK],  [MVLK]  we  have  shown  that  the  LWR  can  be  put  in  the 
normalized  form  given  below.  We  only  give  the  main  results  here  leaving  detailed 
descriptions  to  [MVK],  [MVLK], 
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The  normalized  LWR  algorithm  can  be  expressed  in  the  following  compact  form 
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Here  the  p„+]  are  the  reflection  coefficients  fiven  by 

P„+l  - <0'1/2  <^„r)'7’/2  (14) 
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Furthermore,  if  we  define 

e(n+,)<z)  - entl(z)  efl(z>  e„_,(z>  . . . e0 , 


(16) 


where 
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then  we  also  have 


3„+i(z) 

i 

- e(,'+1)(Z) 

. *„♦](*> . 

i _ 

The  estimated  multichannel  spectrum  is  then  given  by 

R(z)  = An-Hz)An-t(z-1)  = Bn-\z)  Bn-T(z-1)  . (18) 


III.  Extensions  to  Nonstationary  and  ARMA  Models. 

In  [MLVN],  [MVL]  we  have  developed  recursive  algorithms  for  finding  linear 
predictors,  i.e.  fitting  AR  type  models,  to  data  that  are  not  stationary.  These 
algorithms  are  recursive  in  time  and  order,  they  are  also  well  suited  to  track 
time-varying  parameters  of  the  underlying  AR  model.  The  reflection  coefficients 
mentioned  earlier  turned  out  to  be  a natural  parametrization  of  these  algorithms, 
leading  to  realizations  in  so-called  ladder  canonical  forms.  These  forms  also  have 
several  other  nice  features  such  as  lowest  computational  and  storage  requirements 
as  well  as  a "stability  by  inspection"  property.  Detailed  descriptions  of  these  ladder 
forms  are  given  in  [MLVN],  [MVL]  and  [Mo2],  In  the  Appendix  we  present  some 
examples  out  of  a set  of  extensive  simulations  we  prefhrtned,  in  order  to 
demonstrate  their  tracking  behavior  of  time-varying  parameters. 

The  extension  of  AR  type  modeling  methods  to  the  ARMA  case  has  been 
demonstrated  in  the  system  identification  literature,  a discussion  of  several  such 
methods  can  be  found  in  [GP],  [SLG],  [MLK].  A direct  way  of  demonstrating  that 
such  an  extension  is  possible  can  be  found  in  [MLVN],  [Mo2],  where  it  is  shown 
that  considering  the  joint  process  of  observation  driving  noise  the  ARMA 
modeling  problem  can  be  embedded  in  an  AR  modeling  problem  given  the  joint 
process. 
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Appendix 

The  data  used  in  this  simulation  was  generated  by  an  8th  order  AR  process  with 
time-varying  (piecewise  constant)  reflection  coefficients,  driven  by  the  sum  of  a 
Gaussian  white  noise  and  a non-Gaussian  impulse  train.  The  pulses  occur  at  the  step 
changes  in  the  parameters.  Figure  1 shows  the  data  generated  by  such  a model. 
Figures  2 and  3 show  the  actual  and  estimated  parameters  of  the  model.  The 
simulations  show  the  tracking  behaviour  that  the  ladder  form  is  capable  of. 
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Figure  1.  An  8th  order  AR  process  with  time-varying 

(piecewise  constant)  reflection  coefficients 
that  converge  exponentially  to  zero. 

Process  was  driven  by  sum  of  Gaussian  white 
noise  and  an  impulse  train.  There  are  2000 
samp 
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Figure  3(a).  Sixth  reflection  coefficient,  KA,  of  underlying 
AR  model . 0 
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Abstract 


In  this  paper,  a particularly  efficient  procedure  for  achieving  im- 
proved spectral  estimations  from  incomplete  observations  of  data  sequences  is 
presented.  This  problem  has  relavancy  in  such  applications  as  Doppler  radar 
signal  processing  where  one  needs  to  estimate  Doppler  frequency  shifts  based 
upon  a very  small  number  of  radar  returns.  The  essence  of  the  method  is  that 
of  appropriately  estimating  the  unobserved  (or  missing)  data  and  using  the 
enlarged  data  base  to  generate  the  improved  spectrum  estimate.  Clearly,  the 
effectiveness  of  this  method  will  be  dependent  on  how  well  the  missing  data 
can  be  estimated.  Empirical  evidence  accumulated  to  date  indicates  that  this 
paper's  procedure  is  effective  as  well  as  being  computationally  efficient. 


I.  Introduction 


The  spectral  content  of  a continuous- time  signal  is  of  primary  interest 
in  a variety  of  interdisciplinary  applications.  In  particular,  given  the 
signal  x(t),  its  spectrum  is  defined  to  be  the  magnitude  of  the  associated 
Fourier  transform 

oo 

X(to)  = | x(t)e  ^U)tdt  (1) 

— OO 

The  behavior  of  the  spectrum,  |x(w)|,  as  a function  of  oo  often  provides  infor- 
mation otherwise  not  readily  apparent  in  the  original  time  signal.  It  is 
with  this  in  mind  that  a great  deal  of  activity  has  been  recently  devoted  to 
developing  spectrum  estimation  techniques  applicable  to  situations  in  which 
the  signal  x(t)  is  not  completely  observable.  This  lack  of  complete  observa- 
bility can  result,  for  example,  when  one  is  able  to  observe  the  signal  only 
over  a finite  time  interval  even  though  the  signal  is  itself  defined  for  all 
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time,  or,  when  only  discrete-time  samples  of  the  signal  are  provided.  In 
this  paper,  we  shall  be  concerned  with  the  task  of  estimating  a signal's 
spectral  behavior  given  only  a "finite"  set  of  sampled  values  of  the  signal. 
Spectral  estimations  based  on  these  restrications  are  of  fundamental  concern 
is  such  practical  applications  as  typified  by  Doppler  radar  signal  processing. 

Let  us  first  assume  that  the  signal  x(t)  is  uniformly  sampled  every  A 
seconds  to  generate  the  "infinite"  length  sequence  (x(nA)}  in  which  n is  the 
integer  valued  discrete-time  variable.  The  spectrum  of  this  sequence  is 
formally  obtained  by  evaluating  its  associated  discrete-time  Fourier  trans- 
form as  defined  by 

X(w)  = £ x(nA)e  (2) 

n=-oo 

where  the  overbar  is  used  to  distinguish  this  transform  from  the  correspond- 
ing continuous- time  Fourier  transform  (1).  It  is  apparent  that  X(w)  is  a 
peribdic  function  of  w with  period  2tt/A.  Moreover,  it  is  also  readily  estab- 
lished that  the  following  relationship  exists  between  the  Fourier  transforms 
(1)  and  (2) 

°0 

X(u)  = j l X(u  - 2irk/A)  (3) 

k=-°° 

Although  this  relationship  is  true  for  any  choice  of  the  sampling  time  para- 
meter A,  it  is  particularly  meaningful  when  the  continuous-time  signal  is 
band-limited  in  the  sense  that  X(u)  = 0 for  |w|  > <*)]_,  and,  when  the  sampling 
time  is  selected  to  satisfy  the  Nyquist  criterion  A < In  this  impor- 

tant special  case,  it  is  clear  from  expression  (3)  that 

X(w)  = AX(io)  for  | to | < tt/A  (4) 

which  implies  that  the  continuous- time  signal's  Fourier  transform  can  always 
be  recovered  from  the  associated  discrete-time  signal's  Fourier  transform 
under  the  stated  conditions. 

In  any  real-world  application,  however,  it  must  be  appreciated  that  one 
has  available  only  a finite  number  of  samples  upon  which  to  estimate  the 
spectrum.  Specifically,  there  will  be  available  only  the  following  partial 
observation  (usually  finite  in  number)  of  the  underlying  sequence 

x(nA)  for  n e A (5) 

where  the  "observation  set"  A consists  of  an  incomplete  integer  set  (i.e., 

A t {n:  -“  < n < °°}).  When  this  observation  set  consists  of  a contiguous  set 
of  integers  (i.e.,  A = {n:  n^  <_  n _<  n^}),  then  the  resultant  observed  sequence 
is  a standard  truncated  version  of  the  underlying  infinite  length  sequence. 

We  shall  not  so  restrict  A,  however,  for  we  would  then  exclude  such  important 
situations  as:  (1)  when  data  elements  are  missing,  or,  (2)  when  sequence  in- 
terpolation is  required. 


110 


To  estimate  the  spectrum  of  the  underlying  infinite  length  sequence 
{x(nA)}  from  partial  observation  (5),  a natural  procedure  would  be  to  evalu- 
ate the  following  partial  Fourier  transform 

X (id)  = l x(nA)e  ^WnA  (6) 

neA 

It  is  clear  that  when  the  observation  set  A consists  of  all  integers,  then 
this  partial  Fourier  transform  and  the  underlying  Fourier  transform  (2)  are 
identical.  Unfortunately,  when  A consists  of  only  a moderate  sized  set  of 
integers  as  is  typical  in  many  applications,  this  estimate  is  generally  of 
poor  quality.  It  is  with  this  in  mind  that  a variety  of  alternate  spectrum 
estimation  procedures  have  been  recently  developed.  These  include  the  essen- 
tially equivalent  autoregressive  and  maximum  entropy  methods  (e.g.,  see  refs. 
[l]-[3])  and  various  extrapolation  techniques  (e.g.,  see  refs.  [4]-[7]).  In 
this  paper,  we  shall  develop  a procedure  for  estimating  the  behavior  of  the 
underlying  sequence  outside  the  observation  set  A with  the  objective  of  ob- 
taining an  improved  spectral  estimate  from  the  resultant  enlarged  data  base. 
The  task  to  be  then  considered  is  given  by  the  following 

SEQUENCE  RECONSTRUCTION  PROBLEM:  Let  (x(nA)}  be  a H 
band-limited  sequence  in  the  sense  that  its  Fourier 
transform  X(u>)  as  given  by  expression  (2)  is  such  that 

X(w)  = 0 for  (o  t 0 (7)1 

where  0 is  a subset  of  the  frequency  interval  — tt/A  < w 
< tt/A  which  has  nonzero  measure.  Given  the  incomplete 
observation 

x(nA)  for  neA  (8) 

of  the  infinite  length  sequence  {x(n)  } where  Ais  an  incom- 
plete set  of  integers,  estimate  values  for  the  unob- 
served portion  of  the  infinite  length  sequence  (i.e., 
find  x(pA)  for  p i A) . 

In  formulating  the  reconstruction  problem  in  this  manner,  we  are  able  to  sim- 
ultaneously consider  the  apparently  different  special  cases  of  low-pass,  band- 
pass, and  high-pass  sequences. 2 

This  sequence  reconstruction  problem  is  somewhat  ill-posed  in  the  sense 
that  there  exist  an  infinity  of  different  tt/A  band-limited  continuous- time 

•*-The  condition  w t 0 is  here  meant  to  be  all  real  numbers  w in  the  in- 
terval (-tt/A,  tt/A)  not  contained  in  Q. 

2The  sets  0 which  correspond  to  these  special  cases  are  = {w:  |wj 
< co^ } , = {w:  wo  < lwl  < ul^»  anc*  n3  = tw:  <*>i  < | u | < tt/A},  respectively. 
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signals  which  possess  the  same  imcomplete  observation  (8).  To  establish  this 
conjecture,  one  need  simply  apply  the  well-known  rule  for  reconstructing  a 
band-limited  signal  from  its  uniformly  sampled  version  (x(nA)}  where  A has 
been  selected  to  satisfy  the  Nyquist  criterion  (e.g.,  see  ref.  [10],  p.  29), 
that  is 


x(t) 


I x(nA) 
neA 


sin[ir(t  - nA)/A] 
ir(t  - nA)/A 


l x(nA) 

n{\ 


sin[u(t  - nA)/A] 
ir(t  - nA)/A 


= xQ(t)  + xu(t)  (9) 

We  have  suggestively  decomposed  this  summation  into  two  components  which  re- 
flect the  "observed"  and  "unobserved"  elements  of  the  underlying  sampled  se- 
quence (x(nA)}.  Clearly,  "any  selection"  for  the  unobserved  samples  (i.e., 
x(nA)  for  n i A)  will  not  effect  the  behavior  of  the  reconstructed  signal  (9) 
at  the  observation  times  (i.e.,  x(nA)  = x(nA)  for  neA).  Since  there  exists 
an  infinity  of  such  unobserved  selections,  we  have  proven  the  following  lemma 

Lemma  1:  There  exists  an  infinity  of  different  ir/A 
band-limited  continuous-time  signals  which  have 
the  specified  incomplete  observed  sampled  values  (8). 

The  set  of  signals  alluded  to  in  this  lemma,  in  fact,  is  composed  of  the 
linear  variety  (i.e.,  a translated  subspace)  given  by 

V = x + M 
1 o 

where  xq  is  given  by  the  fixed  observed  first  term  on  the  Tight  side  of  rela- 
tionship (9)  while  M is  the  closed  subspace  spanned  by  the  basis  vectors 
4>n(t)  = Asin[Tr(t-  nA)/A]/Tr(t  - nA)  for  n i A. 

The  band-limited  continuous-time  signal  which  generated  the  observed 
samples  (8)  is  then  known  to  lie  in  linear  variety  V^.  If  we  were  to  uni- 
formly sample  each  of  the  signals  in  V-^,  there  would  result  an  associated 
linear  variety  of  infinite  length  sequences.  Generally,  only  a small  subset 
of  these  sequences  will  be  Q band-limited  in  the  sense  (7)  with  the  over- 
whelming number  not  being  band-limited  at  all  (i.e.,  their  Fourier  transforms 
being  zero  only  on  zero  measure  subsets  of  the  interval  (— tt / A , tt/A))  . Our  in- 
terest is  clearly  confined  to  those  continuous-time  signals  in  which  give 
rise  to  infinite  length  sequences  which  are  0 band- limited.  The  proposed  re- 
construction problem  will  be  well-posed  only  if  there  exists  one  such  signal 
in  Vx. 


II.  Vector  Space  Formulation 

It  will  be  advisable  to  restate  the  reconstruction  problem  in  a vector 
space  setting  so  as  to  make  use  of  the  many  powerful  methods  of  linear 
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operator  theory.  In  particular,  we  shall  direct  our  attention  to  the  set  of 
sequences  which  possess  finite  energy.  The  set  of  finite  energy  sequences  is 
known  to  constitute  a Hilbert  space  (e.g.,  see  ref.  [8])  which  is  denoted  by 
the  symbol  £2  and  is  specified  by 

£2  = (x:  <x,  x>  is  finite}  (10) 

where  we  have  chosen  to  represent  the  sequence  (x(n)}  by  the  more  compact  no- 
tation x . Furthermore,  the  operator  < , > used  in  defining  space  £2  is 
the  standard  sequence  inner  product  as  defined  by 

OO 

<x1,  x2>  = l x1(n)x2(n)  (11) 

n=-°° 

* 

in  which  x2(n)  denotes  the  complex  conjugate  of  x2(nA).  It  can  be  shown  that 
any  sequence. contained  in  Hilbert  space  £2  possesses  a Fourier  transform  as 
given  by  relationship  (2).  Moreover,  the  generating  sequence  elements  can  be 
recovered  from  its  assoicated  Fourier  transform  by  means  of  the  inverse  Four- 
ier transform- relationship 

tt/A 

x(nA)  = l X(lo)eJ“nAdw  (12) 

2lT-7T/A 

Our  primary  interest  in  Hilbert  space  £2  will  be  concerned  with  the  sub- 
set of  sequences  that  are  band-limited  relative  to  a given  nonzero  measure 
frequency  subset  Q of  the  interval  -n/A  < u < ir/A.  This  sequence  subset  will 
be  denoted  by  B(fi)  in  which 

B(ft)  = {x  e £2:  X(w)  = 0 for  u>  i 0}  (13) 

where  X(w)  denotes  the  Fourier  transform  of  sequence  x.  It  is  a relatively 
simple  matter  to  show  that  this  subset  is  in  fact  a closed  subspace  of  £3. 

In  addition,  there  exists  a companion  closed  subspace  of  B(Q),  known  as  its 
orthogonal  complement,  which  is  defined  by 

B(fl)"^  = {x  e £2:  X(w)  = 0 for  weft}  (14) 

Using  the  discrete-time  version  of  Parsevals'  theorem,  it  immediately  follows 
that  these  two  closed  linear  subspaces  are  orthogonal  (i.e.,  <x^,  X2>  * 0 for 
all  x^  e B(fi)  and  X2  e B(ft)x). 

Since  the  set  B(ft)  is  a closed  subspace  of  £2,  the  following  direct  sum 
decomposition  of  Hilbert  space  £2  is  evident 

£2  = B(n)  © B(n) _L 
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This  implies  that  any  sequence  x e can  be  expressed  uniquely  as  x=x^ 
with  xieB(ft)  and  X2tB(ft)  . For  reasons  which  will  be  shortly  made  apparent, 
a procedure  for  effecting  this  decomposition  will  now  be  given.  This  is 
readily  achieved  by  equivalently  expressing  the  inverse  Fourier  transform  re- 
lationship (12)  as  follows 


x(nA) 


^ j X(w)ejunAdw  + 27  ( X(w)ejnwAdm 


weft 

x^(nA)  + X2(nA) 


w^ft 


(15) 


where  the  first  integral  has  been  set  equal  to  x^(nA)  and  the  second  to 
X2(nA).  Clearly,  the  sequences  xj^and  X2  so  generated  will  be  contained  in 
the  closed  subspaces  B(ft)  and  B(ft)  , respectively,  and  the  required  decompo- 
sition has  been  made. 


An  examination  of  integral  relationship  (15)  indicates  that  the  re- 
quired sequence  decomposition  can,  in  fact,  be  achieved  by  passing  the  se- 
quence x through  the  ideal  ft  band-pass  digital  filter  whose  frequence  trans- 
fer function  is  given  by 


F'w) 


for  weft 
for  w i ft 


(16) 


In  particular,  the  response  of  this  ideal  filter  t£  the  input  sequence 
(x(n  )}  has  the  Fourier  transform  expression  H(w)  X(w)  which  is  seen  to  be 
identically  zero  for  w i ft.  Thus,  this  response  sequence  must  be  contained 
in  subspace  B(ft).  With  this  in  mind,  let  us  then  express  the  Fourier  trans- 
form of  the  sequence  x using  the  identity 


X(w)  = H(w)X(w)  + (1  - H(w)]X(w) 


(17) 


The  sequence  which  corresponds  to  H(w)X(w)_is  then  identified  with  (x^(n)} 
while  that  which  corresponds  to  [1  - H(w)]X(w)  is  clearly  equal  to  (x2(n)} 
since  1 - H(w)  is  identically  one  for  w i ft  and  zero  for  weft.  Relation- 
ship (17)  thereby  yields  the  desired  decomposition  in  the  frequency  domain. 


Observation  Operator 

To  complete  our  vector  space  formulation,  we  shall  now  introduce  two 
linear  operators  defined  on  Hilbert  space  The  first  operator  will  be 

appropriately  referred  to  as  the  observation  operator,  L,  which  is  charac- 
terized by 


y = Lx 


(18a) 
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where 


r 


x(nA) 

0 


for  n e A 
otherwise 


(18b) 


y 


(nA) 


Here,  A,  is  the  finite  observation  set  used  in  defining  the  sequence  recon- 
struction problem.  Clearly,  the  observation  operator  produces  only  a partial 
observation  of  the  sequence  x being  operated  upon  and  as  such  it  is  not  an 
invertible  operator.  Namely,  given  the  sequence  y,  it  is  not  possible  to 
uniquely  recover  the  sequence  x which  generated  y unless  other  constraints 
on  sequence  y are  imposed. 

Ideal  Band-Pass  Operator 

The  second  operator  corresponds  to  the  previously  mentioned  ideal  £2 
band-pass  filter  as  characterized  by  transfer  function  (16).  We  shall  put 
this  filtering  operation  into  the  more  compact  operator  relationship 

y = Px  (19a) 

where  the  elements  of  the  sequences  x and  y are  related  by  the  convolution 
summation 


y (n)  = l h(k)x(n-k)  (19b) 

k=-00 

The  unit-impulse  response  sequence  which  characterizes  this  filter  is  simply 
the  inverse  Fourier  transform  of  the  ideal  transfer  function  (16),  that  is 

eJnuAda)  (19c) 

ooe£2 

It  is  interesting  to  note  that  the  set  of  £2  band-limited  sequences  in  fact 
corresponds  to  those  sequences  in  1 2 which  are  eigen-sequences  of  operator  P 
with  corresponding  eigenvalue  one,  that  is 

B(£2)  = (xe  22:  x = Px} 

It  is  now  possible  to  reformulate  the  sequence  reconstruction  problem 
using  the  vector  space  concepts  presented  in  this  section.  Namely, 

SEQUENCE  RECONSTRUCTION  PROBLEM:  Let  the  se- 
quence x e B(£2)  which  in  turn  requires  that 

x = Px  (20a) 

where  P is  the  ideal  £2  band-pass  operator  (19). 

Furthermore,  let  there  be  provided  an  incomplete 
observation  of  this  band-limited  sequence  as 
specified  by 


h(nA)  = 


2-rr 
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y = Lx 


(20b) 


where  L is  the  observation  operator  (18).  From 
the  incomplete  observation  sequence  y,  estimate 
values  for  the  unobserved  portion  of  sequence  x. 

The  reconstruction  problem  as  here  formulated  has  the  potential  of 
being  ill-posed  due  to  the  basic  nature  of  the  operator  L and  subspace  B(52). 
This  will  be  the  case  if  the  null  space  of  operator  L as  specified  by 

N(L)  = (x  e B(J5):  Lx  = 0}  (21) 

is  nontrivial.  We  reach  this  conclusion  by  noting  that  the  solution  to  opera- 
tor relationship  (20)  must  lie  in  the  linear  variety 

V2  = x + N(L)  (22) 

where  x is  any  sequence  which  will  satisfy  expression  (20).  Clearly,  if  N(L) 
contains  more  than  the  zero  sequence,  there  will  exist  an  infinity  of  differ- 
ent sequences  in  B(52)  which  will  satisfy  relationship  (20).  Fortunately, 
this  potentially  damaging  ill-posedness  will  not  be  present  when  the  subspace 
N (L)  contains  only  the  zero  sequence,  or,  when  it  is  possible  to  impose 
further  restrictions  on  the  class  of  51  band-limited  sequences  to  be  considered 
(e.g.,  sinusoidal  sequences)  which  has  the  effect  of  causing  N(L)  to  contain 
only  the  zero  sequence.  In  any  case,  the  investigator  must  appreciate  the 
potentially  intrinsic  ill-posed  nature  of  the  problem  at  hand  when  offering 
solutions  to  the  sequence  reconstruction  problem. 

We  shall  conclude  this  section  by  making  an  important  characterization 
of  the  observation  and  ideal  band-pass  operators  L and  P,  respectively. 

Namely,  it  is  a relatively  simple  matter  to  show  that  they  are  each  idem-' 
potent  (i.e.,  L^  = L and  = P)  and  that  their  range  and  null  spaces  are 
orthogonal  (i.e.,  R(L)  _|_  N(L),  and,  R(P)_L  N(P)).  As  such,  these  two  opera- 
tors which  characterize  the  sequence  reconstruction  problem  are  orthogonal 
projection  operators.  This  characterization  can  play  a most  vital  role  in 
any  attempt  at  finding  solutions  to  relationship  (20). 

III.  Sequence  Reconstruction  Procedure 

Recently,  a signal  reconstruction  algorithm  for  obtaining  a solution  to 
a more  general  version  of  operator  relationship  (20)  has  been  developed  [9]. 
This  algorithm  is  based  upon  the  method  of  successive  corrections  and  takes 
the  form 

x = x , - PLx  , + P(Lx)  n=l,2,3,’**  (23) 

n n-1  n-1 
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where  we  have  suggestively  expressed  the  observed  signal  y by  its  equivalent 
Lx.  The  initial  approximation  sequence  xQ  is  required  to  lie  in  B(ft)  and  is 
typically  selected  to  be  the  zero  sequence.  Using  the  fact  that  P and  L are 
orthogonal  projection  operators,  it  has  been  shown  that  this  reconstruction 
algorithm  generates  a sequence  (of  sequences)  (xn)  which  converges  in  the 
sense  that  (see  ref.  [9]) 


xn  ->  x + u (24) 

The  sequence  u is  equal  to  the  orthogonal  projection  of  the  initial  approxi- 
mation error  x0  - x onto  the  subspace  consisting  of  ft  band-limited  sequences 
which  lie  in  the  null  space  of  operator  L,  that  is 

N (L)  = (x  e B(ft) : Lx  = 6} 

When  the  sequence  xQ  - x is  nearly  orthogonal  to  N(L),  it  follows  that 
u s«  0 and  the  algorithm  will  converge  to  the  desired  result  x.  In  general, 
this  may  not  be  the  case  and  the  generated  sequence  (x(nA)}  can  converge  to 
a sequence  other  than  the  original  x.  It  has  been  empirically  determined 
that  in  many  applications,  the  desired  condition  xQ  - x being  orthogonal 
to  N(L)  is  nearly  meet  thereby  resulting  in  a desired  convergence  property. 
It  is  important  to  note,  however,  that  the  effectiveness  of  proposed  algori- 
thm(23)  (or  many  other  reconstruction  procedures)  will  be  dependent  on  the 
nature  of  the  class  of  ft  band-limited  sequences  to  be  considered. 

Although  reconstruction  algorithm  (23)  will  generate  an  approximation 
sequence  which  has  guaranteed  convergence,  the  convergence  rate  can  be  dis- 
appointingly slow.  In  recognition  of  this  undesirable  situation,  a direct 
solution  procedure  has  been  recently  developed  which  will  yield  the  desired 
reconstruction  [4].  Namely,  one  first  finds  the  sequence  z e R(L)  (i.e., 
z(n)  = 0 for  n i A)  which  will  satisfy  the  operator  relationship 

Step  1:  LPz  = Lx  (25) 

Once  this  sequence  has  been  found,  the  reconstructed  sequence  is  obtained  by 
operating  on  z by  P,  that  is 

Step  2:  x + u = Pz  (26) 

This  two-step  procedure  will  yield  the  reconstructed  sequence  so  long  as  the 
solution  z obtained  from  operator  relationship  (24)  is  such  that 

lim  [LP]kz  = 6 (27) 

k-*® 
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Any  reaonably  well-behaved  sequence  z will  eventually  be  annihilated  by  an 
infinite  number  of  sequential  applications  of  the  composite  operator  LP. 

Thus,  requirement  (27)  offers  no  real  restriction  in  terms  of  finding  a solu- 
tion z.  This  convenient  reconstruction  procedure  has  been  found  to  be  very 
effective  in  all  examples  treated  to  date  and  avoids  the  typically  slow  con- 
vergent rates  so  characteristic  of  signal  reconstruction  algorithms  (e.g., 

[5]  and  [6]). 


IV.  Sequence  Extrapolation  and  Interpolation 

Two  relavant  reconstruction  tasks  which  have  received  much  attention 
in  the  literature  are  that  of  the  extrapolation  or  interpolation  of  band- 
limited  sequences.  We  shall  consider- the  specific  case  in  which  the  band- 
limit  set  £2  is  given  by 

£2  = {w:  u>  < |oj|  < o)^}  (28) 

where  w0  and  are  fixed  numbers  contained  in  the  interval  [0,  it/A).  The 
ideal  unit-impulse  response  which  corresponds  to  this  £2  band-limit  set  is 
given  by  expression  (19c),  that  is 

h(nA)  = sin 0*1  nA)  - sin(a>0nA)  (29) 

niT 

We  shall  now  consider  separately,  the  extrapolation  and  interpolation  prob- 
lems which  correspond  to  this  £2  band-limit  set. 


Extrapolation 

In  the  extrapolation  task,  the  observation  set  will  be  taken  to  be  the 
contiguous  set  of  integers 

A = {M,  M + 1,  •••,  N - 1,  N}  (30) 

where  the  integers  M and  N define  the  observation  interval.  With  this  selec- 
tion of  £2  and  A,  operator  relationship  (25)  becomes 

l sin([n  - k](o^A)  - sin([n  r.k]u0A)  z(kA)  = x(nA)  for  n e A (31) 
keA  it  (n  — k) 

where  use  of  the  fact  that  z(kA)  = 0 for  k t A has  been  made.  We  next  solve 
this  consistent  system  of  N + 1 - M linear  equations  for  the  N + 1 - M 
unknowns  z(kA)  with  keA.  Once  this  solution  has  been  determined,  the  de- 
sired extrapolation  is  achieved  by  incorporating  relationship  (26),  that  is 
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x(pA) 


= l 

k£  A 


sin([p  - k ] ojj A ) - sin[(p  - k)u0A] 
n(p  - k) 


z(kA)  for  p i A 


(32) 


In  essence,  this  extrapolation  requires  the  solution  of  the  system  of 
N + 1 - M linear  equations  (31)  and  then  an  evaluation  of  relationship  (32) 
for  each  extrapolation  element  desired.  Clearly,  the  computational  require- 
ments of  this  algorithm  are  minimal.  The  most  time  consuming  aspect  of  this 
procedure  is  that  of  solving  relationship  (31)  which  requires  on  the  order 
of  (N  + 1 - M)^  multiplications  (or  even  fewer  if  more  specialized  procedures 
such  as  Levinson' s method  are  used). 


Extrapolation  Example 

We  shall  now  illustrate  this  extrapolation  procedure  by  considering  the 
case  where  the  parameter  selections  = 2tt/25A  and  wQ  = 0.8w^  are  made. 
Furthermore,  the  underlying  continuous-time  signal  x(t)  to  be  extrapolated 
is  given  by 


x(t)  = sin(0.99  w^t)  + sin(0.85  u^t) 

The  sequence  which  results  when  this  signal  is  uniformly  sampled  is  seen  to 
be 


x(nA)  = sin(0.99  nw^A)  + sin(0.85  nw^A) 

in  which  m^A  = 2x/25.  It  is  apparent  that  the  spectrum  of  (x(nA)}  is  en- 
tirely contained  within  the  specified  band-limit  set  fl  = (w:  0.8  < |w| 

< Ui}.  Thus,  the  extrapolation  procedure  as  characterized  by  relationships 
(31)  and  (32)  is  applicable.  Finally,  the  observation  set  A is  taken  to  be 

A = (n:  -5  <_  n <_  5} 

which  corresponds  to  only  0.374  of  one  period  of  the  lower  frequency  sinu- 
soidal component  constituting  x(nA) . 

Using  relationship  (31)  with  the  above  selection  of  w0,  u]_,  and  A, 
there  will  result  a system  of  11  Toeplitz  equations  to  solve  for  the  eleven 
unknowns  z(nA)  for  n e A.  Using  a modified  version  of  the  Levinson  algori- 
thm, this  system  of  equations  is  solved  and  that  solution  is  inserted  into 
the  extrapolation  expression  (32)  which  is  then  evaluated  for  p = +6,  + 7, 
•••,  +99.  The  results  of  this  procedure  are  shown  in  Figure  1 in  which  the 
continuous  line  corresponds  to  the  underlying  continuous-time  signal  and  the 
crosses  outside  the  observation  window  denote  the  extrapolations  while  those 
within  the  window  denote  the  eleven  samples  of  x(t)  used  in  the  extrapola- 
tion procedure.  Obviously,  the  extrapolation  procedure  has  yielded  virtually 
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exact  results.  If  standard  spectral  estimation  techniques  such  as  the  auto- 
regressive or  maximum  entropy  procedures  are  applied  to  this  extrapolated 
sequence  of  length  199,  the  resultant  spectrum  possesses  very  narrow  peaks 
at  radian  frequencies  0.99wi  and  0.85u)^.  This  is  to  be  contrasted  with  the 
spectrum  estimates  from  the  original  unextrapolated  sequence  of  length  11 
which  are  not  well  defined  at  all.  This  behavior  is  demonstrated  in  Figure 
2. 


Interpolation 

For  ease  of  presentation,  the  observation  set  A which  corresponds  to 
the  interpolation  reconstruction  problem  is  taken  to  be 

A = {M,  M + q,  M + 2q,  •••,  M + (N  - M)q}  (33) 

where  q is  a positive  integer  greater  than  one  and  M and  N are  integers  which 
determine  the  number  of  observations  made  (ite.,'N  + 1 - M) . It  is  apparent 
from  this  observation  set  structure  that  we  have  available  every  qth  sample 
of  the  underlying  sequence  {(x(nA)}  commencing  at  discrete-time  M and  con- 
cluding at  M + (N  - M) q . With  this  choise  of  A,  operator  relationship  (25) 
becomes 


£ h([n  - k]A)z(kA)  = x(nA)  for  n e A (34) 

kcA 

where  h(nA)  is  specified  by  relationship  (29).  We  now  solve  this  consistent 
linear  system  of  N + 1 - M equations  in  the  N + 1 - M unknowns  z(kA)  for 
k e A.  Finally,  this  solution  is  substituted  into  relationship  (26)  to  ob- 
tain the  desired  reconstruction,  that  is 

x(pA)  + u(pA)  = £ h([p  - k]A)z(kA)  for  p i A (35) 

keA 

Our  interest  in  this  “interpolation"  problem  is  typically,  though  not  neces- 
sarily, confined  to  evaluating  relationship  (35)  for  integers  p which  fill 
in  the  gaps  of  the  observation  set  A (i.e.,  p = M + 1,  M + 2,  •••,  M + q - 1, 
M + q + 1,  •••).  As  in  the  extrapolation  problem,  the  computational  re- 
quirements of  this  interpolation  procedure  are  modest.  Namely,  we  solve  the 
system  of  N + 1 - M linear  equations  (34)  and  then  evaluate  relationship  (35) 
for  each  interpolation  element  desired. 


V.  Conclusion 


An  efficient  one-step  procedure  for  reconstructing  a band-limited  se- 
quence from  an  incomplete  observation  of  that  sequence  has  been  presented. 
This  method  basically  entails  the  solving  of  a system  of  consistent  linear 
equations  where  the  number  of  equations  and  unknowns  equals  the  number  of 
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sequence  observations.  The  potential  ill-posed  nature  of  the  underlying 
problem  as  well  as  procedures  for  circumventing  this  difficulty  were  dis- 
cussed. The  effectiveness  of  the  proposed  sequence  reconstruction  method 
was  demonstrated  by  means  of  an  example. 
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Fig.  2 Spectrum  Estimation  Plots:  (a)  extrapolated  sequence  of 
length  199  and  a 15cb  order  autoregressive  model,  (b)  original 
sequence  of  length  11  and  a 5^  order  autoregressive  model. 
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Introduction 


In  analyzing  and  characterizing  experimental  or  numerical  electromagnetic 
response  data  one  desires  to  extract  parameters  that  can  be  related  to  physi- 
cal characteristics  of  the  system  being  studied.  One  obvious  set  of  physical- 
ly related  parameters  are  the  complex  resonances  of  the  system  and  their  rela- 
ted coefficients.  Indeed,  spectral  data  usually  lends  itself  to  visually 
identifying  these  natural  frequencies,  however  the  damping  constants  cannot 
be  obtained  as  easily.  Similarly,  temporal  data  generally  allows  one  to  vis- 
ually determine  the  dominant  natural  frequency  and  if  enough  data  is  present, 
its  damping  constant. 

Mains  and  Moffatt[l]  introduced  the  concept  of  using  complex  natural 
resonances  of  a target  as  a basis  for  target  recognition.  They  made  use  of 
the  fact  that  a few  natural  resonances  of  a body  are  adequate  to  distinguish 
the  body  within  a finite  collection  of  b>dies.  Baum[2]  developed  the  formal- 
ism known  as  the  Singularity  Expansion  Method  (SEM)  which  enables  one  to 
write  any  electromagnetic  response  of  a system  in  an  expansion  of  these  com- 
plex resonances  or  poles  and  residues.  Both  of  the  above  methods  were  based 
on  obtaining  the  resonances  from  a set  of  equations  which  characterized  the 
electromagnetic  response  of  the  body,  much  as  a circuit  theorist  finds  his 
resonances  by  solving  a differential  equation.  The  problem  still  exists  of 
how  one  obtains  these  resonances  from  response  data.  In  particular,  how  to 
obtain  the  parameters  from  transient  response  data  from  EMP  simulators  and 
transient  radar  ranges. 

About  four  years  ago  Prony's  algorithm[3]  was  once  more  dusted  off  and 
applied  to  this  problem  of  information  extraction  from  electromagnetics  data. 
The  first  application  at  that  time  of  Prony's  method  was  to  numerically  gen- 
erated transient  current  on  a thin  dipole.  The  results,  which  were  reported 
at  the  1974  USNC/URSI  meeting [4]  by  this  author  gave  a set  of  poles  which 
compared  very  closely,  if  not  exactly,  with  the  first  ten  even  modes  previous- 
ly calculated  by  Tesche[5].  As  a result  of  this  initial  demonstration,  sev- 
eral researchers  began  studying  Prony's  method  to  determine  its  utility  for 
analyzing  several  kinds  of  transient  data  and  to  look  for  solutions  to  some 
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of  the  problems  inherent  in  the  process.  In  addition,  Brittingham,  Miller 
and  Willows [6]  demonstrated  that  a procedure  parallel  to  the  time  domain 
Prony's  method  could  be  applied  to  frequency  domain  data. 

Some  of  the  initial  questions  which  were  asked  about  Prony's  method  and 
which  are  in  part  still  being  studied  were: 

1.  Will  Prony's  method  work  if  multiple  poles  are  present? 

2.  How  does  one  determine  a priori  the  order  of  the  system? 

That  is,  how  many  poles  are  contained  in  the  response  data? 

3.  What  effects  does  noisy  data  have  on  the  Prony  algorithm? 

4.  How  do  we  insure  or  know  the  accuracy  of  the  poles  returned? 

5.  What  direct  applications  does  the  Prony  procedure  have? 

These  questions  were  all  addressed  to  some  extent  in  this  author's  dis- 
sertation^] . It  was  found  that  Prony's  method  would  work  for  the  case  of 
multiple  poles  without  any  change  in  the  pole  searching  algorithm.  Two 
methods  were  discovered  by  which  the  order  of  the  system  could  be  determined. 
These  methods  are  the  Householder  orthogonal ization  procedure  and  the  Eigen- 
value method.  Examples  of  these  methods  applied  to  clean  and  noisy  data  can 
be  found  in  this  author's  thesis [7]  and  appeared  in  the  recent  special  EMP 
issue  of  AP-S[8].  The  applications  of  Prony's  method  seem  to  be  never-ending. 
These  range  from  analysis  of  electrocardiograms [9]  to  radar  target  recogni- 
tion[10].  The  most  important  and  useful  application  made  to  date  has  been  to 
the  analysis  of  EMP  simulator  data.  The  problem  of  noise  and  Prony's  method 
is  a very  complex  one  and  will  be  discussed  later  in  this  paper. 

The  remaining  part  of  this  paper  will  present  a very  brief  introduction 
to  the  mathematics  involved  in  the  Prony  algorithm.  The  problem  of  noise  and 
its  effect  on  Prony's  method  will  be  discussed  and  several  solution  methods 
will  be  mentioned.  Finally,  some  examples  of  the  Prony's  method  will  be 
presented. 

It  should  be  kept  in  mind  that  this  is  intended  as  a review  paper  and 
will  necessarily  be  brief.  Since  most  of  the  discussion  here  has  been 
detailed  elsewhere,  these  will  be  cited  as  thoroughly  as  possible. 

The  Prony  Algorithm 

The  Prony  algorithm  was  first  published  in  1795 [3].  Since  that  time  the 
method  has  been  described  by  many  authors.  A good  description  can  be  found 
in  Hildebrand [11 ] . The  method  is  based  on  the  fact  that  the  system  we  are 
modeling  can  be  represented  by 


A.  e 
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(1) 
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R(t)  - 

i=l 


sit 


where  R(t)  is  the  response,  the  Sj  are  the  complex  poles  and  the  are 

the  corresponding  residues.  The  can  also  be  written  as  = 04  + ja>i . 

The  04  are  thought  of  as  the  damping  constants,  and  the  04  are  the  natural 
frequencies  in  radians  per  second.  Since,  in  practice,  the  measured  data  will 
appear  as  a set  of  discrete  data.  Equation  (1)  is  rewritten  as 

S^A  t 

R(tn)  = Rn  =2^  Ai  e ’ n = °»  !.  •••»  M'1  (2) 

i=l 

where  At  is  the  time  step  size  and  M is  the  total  number  of  samples  taken. 
The  set  of  Equations  (2)  is  seen  to  be  M nonlinear  equations  in  2N  un- 
knowns. If  M is  equal  to  or  greater  than  2N,  and  if  all  At  are  equal, 
then  this  nonlinear  set  of  equations  can  be  solved  using  the  Prony  algorithm. 


Prony  recognized  that  the  Rn  in  (2)  must  satisfy  a difference  equation 
of  order  N which  may  be  written  as 

N 

y*v  Rp+k  = 0 , k = 0, 1, ...,  Y-i,  (3) 

P=0 

where  y is  the  value  of  M - N.  The  roots  of  the  algebraic  equation 

X % zP  * 0 (4) 

p=o 

define  the  natural  frequencies  through 

S.At 

= e , i = 1,  2,  .«»,  N . (5) 

If  in  Equation  (3)  is  defined  equal  to  1,  then  the  remaining  dp's 

may  be  obtained  by  solving  the  equation 

N-l 

X °V  Vk  ■ - Vk  • <6> 

P=0 

If  2N  data  samples  are  used,  the  Equation  (6)  can  be  solved  exactly  for  the 
a's.  If  more  than  2N  samples  are  desired,  then  one  can  use  a least-squares 


127 


type  fit  to  (6).  Once  the  Op  have  been  found  then  the  roots,  Zi  = exp(SiAt), 
of  Equation  (4)  can  simply  be  found  and  the  poles  are  then  obtained  as 


S. 

1 


In  Z. 

l 

At 


(7) 


It  is  a simple  procedure  to  obtain  the  residues,  A^,  by  solving  the  matrix 
equation  embodied  in  Equation  (2)  once  the  are  known. 

The  Noise  Problem 


The  problem  of  Prony's  method  and  noisy  data  became  apparent  the  first 
time  noisy  data  was  analyzed  with  the  technique.  This  author's  dissertation 
[7]  and  two  recent  papers  in  the  special  EMP  AP-S [8 ] [12 ] transactions  discuss 
some  of  the  typical  effects  which  noise  has  on  the  results  of  the  method. 

Noise  in  the  data  has  a tendency  to  give  totally  lousy  results  if  some  attempt 
is  not  made  to  correct  for  its  existence.  It  is  likely  that  noise  will  per- 
turb the  extracted  poles  in  such  a way  that  they  are  not  at  all  similar  to 
the  true  poles.  Noise  almost  always  makes  the  damping  constant  ot  too 
large.  While  the  effects  of  the  noise  are  well  known  and  documented,  the 
cause  of  those  effects  are  not  well  understood.  Don  Dudley [13] [14]  has  done 
an  excellent  study  of  some  of  the  reasons  behind  the  noise  problems.  Obvi- 
ously it  is  necessary  to  understand  the  cause  of  these  problems  before  we  can 
apply  any  rational  procedures  for  correcting  them.  Following  is  a brief  out- 
line of  what  appears  to  be  the  major  effect  i>f  the  noise. 

The  key  step  to  Prony's  method  is  the  solution  for  the  coefficients  otp 
in  the  difference  Equation  (6)  repeated  here  for  convenience; 

N-l 

°p  Rp+k  = " RN+k  ’ k = 0,  1,  ....  Y-l  . (6) 

P=0 

However,  in  the  solution  for  the  otp  we  do  not  know  the  R +,  exactly.  The 
measured  data  can  be  expressed  as  " 

Yk  - \ + ek  <8> 

where  e^  is  the  error  in  the  kth  sample.  Hence,  Equation  (6)  can  be 
rewritten  as 

N-l 

£ “p  V * • Vk  * \ > k - 0.  1.  ....  Y-l  • W 

P=0 

N 

where 

WK=X)  °p  Cp+k  * (10) 

P=0 
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The  are  the  residuals.  In  solving  Equation  (9)  in  a least  squared  error 

sense,  the  are  minimized  with  respect  to  the  otp  as 


Y-l 


8a 
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= o 


m = 0,  1, 


(11) 


k=0 


which  gives  the  normal  equations  of  least  squares  as 
N-l  y-l  y-l 


E“pE 

P=0  k=0 
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p+k  m+k 
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k=0 


YN+k  Ym+k  » m °* 


...  N-l  . (12) 


If  the  are  independent  and  equally  distributed  with  zero  mean,  then  as 

Y+oo  the  Op  converge  to  the  true  parameters.  However,  Equation  (10) 
clearly  shows  that  the  are  not  independent  but  are  correlated,  which 

results  in  biased  estimates  of  the  Op.  Dudley [14]  has  shown  that  as  the 
value  of  y increases  the  ap  do  converge  but  converge  to  the  wrong  answer. 
It  is  this  biased  answer  that  causes  the  damping  constants  to  be  too  large. 
However,  why  they  consistantly  come  out  too  large  as  opposed  to  too  small  is 
unknown.  This  biasing  is  probably  the  major  cause  for  all  the  strange 
results  which  are  known  to  occur  in  Prony's  method  when  noise  is  present. 

At  present  there  appears  to  be  as  many  methods  for  correcting  for  the 
noise  problem  as  there  are  researchers  studying  the  Prony  algorithm.  These 
methods  include  prefiltering  or  averaging  the  data,  internally  whitening  the 
residuals,  and  multiple  processing  with  statistical  analysis  of  the  resulting 
poles.  The  more  successful  methods  will  be  briefly  discussed  here,  but  the 
reader  is  encouraged  to  study  the  original  works  to  get  a complete  under- 
standing of  the  procedures. 


If  the  analyst  is  fortunate  enough  to  have  sufficient  data  he  can  apply 
averaging  or  smoothing  procedures  to  the  data  to  lower  the  standard  deviation 
of  the  error.  If  N samples  are  averaged,  then  the  new  average  data  point 
is  obviously  N 


and  the  new  standard  deviation  of  this  point  is 


a 


(13) 


This  says  that  in  order  to  get  an  order  of  magnitude  decrease  in  0,  N must 
equal  100.  Hence,  large  amounts  of  data  are  needed  to  get  a significant 
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decrease  in  the  noise  level  by  averaging. 

Measurement  noise  seems  to  manifest  itself  in  Prony's  method  as  high 
frequency  components  in  the  data.  Hence,  Prony's  method  needs  to  find  many 
high  frequency  poles  outside  the  band  of  the  required  results.  This  suggests 
operating  on  the  original  data  with  a low  pass  filter  to  remove  the  high  fre- 
quency components.  This  basically  is  equivalent  to  averaging  the  data  but  is 
implemented  in  a different  fashion.  One  method  of  filtering  which  has  been 
used  by  Cordaro[15]  with  fair  success  is  the  two-pole  Butterworth  digital 
filter.  It  is  this  author's  opinion  that  much  work  still  needs  to  be  done  to 
find  the  optimum  preprocessing  or  filtering  techniques  to  use  with  Prony's 
method.  It  will  likely  turn  out  that  depending  on  the  type  of  data  being 

studied,  different  processing  schemes  will  have  to  be  used. 

Even  once  filtering  or  averaging  are  used,  there  will  still  be  noise  in 

the  data.  Hence,  the  residuals  of  Equation  (10)  will  still  be  correlated 

and  the  estimates  of  the  difference  equation  coefficients  otp  will  be  biased. 
Two  methods  appear  to  be  useful  in  correcting  the  problem  of  the  biased  esti- 
mates. These  methods  are  the  repeated  least  squares  method  and  the  iterative 
generalized  least  squares  method.  Space  does  not  allow  explanations  of  these 
methods.  The  reader  is  referred  to  excellent  reports  by  Dudley[13] [14]  and 
Cordaro[13]  which  explain  the  details  of  these  methods  and  show  examples  of 
the  results.  The  state-of-the-art  of  these  procedures  applied  to  Prony's 
method  warrants  more  research,  but  the  early  results  bode  optimistic  results. 

If  large  amounts  of  data  are  present,  Prony's  method  can  be  applied  to 
several  windows  in  the  same  data  record.  If  M windows  are  used,  each 
resulting  in  N poles,  then  the  results  will  be  M by  N poles.  These 
poles  can  be  correlated  to  see  which  are  true  system  poles  and  which  are 
poles  due  to  the  noise.  The  assumption  is  that  the  poles  due  to  the  noise 
will  shift  wildly  from  window  to  window,  while  the  true  poles  will  stay  essen- 
tially unchanged.  Once  the  true  poles  are  determined  by  looking  for  clumps  of 
M poles,  then  the  mean  value  can  be  found  by  averaging.  This  technique  has 
been  used  fairly  successfully  by  Hudson  and  Lager[16]  and  by  this  author[17]. 
There  appears  to  be  one  basic  flaw  in  this  technique,  however.  That  is,  if 
there  is  noise  in  the  data  which  biases  the  estimates  of  'otp,  then  the  true 
poles  will  be  biased.  Hence,  the  true  poles  will  probably  clump  around  the 
wrong  result.  This  aberration  has  not  appeared  to  be  a problem  in  previous 
tests  and  needs  to  be  studied  in  the  future. 

Examples  of  Prony's  Application 

Three  previously  published  examples  of  the  use  of  Prony's  method  will  be 
presented  in  this  section.  These  examples  were  chosen  because  of  some  of  the 
insight  which  they  give  into  the  various  attributes  of  the  method.  Two  sets 
of  noisy  data  are  presented.  It  is  not  the  intention  here  to  hide  the  prob- 
lems of  Prony's  method  by  showing  only  good  examples;  rather,  it  is  hoped 
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that  by  showing  some  of  the  good,  encouraging  results  more  research  will  be 
directed  into  trying  to  solve  some  of  the  problems  that  hinder  the  method. 

Example  1 — From  Reference  [7] 

A 1.0  m dipole  with  a half-length-to-radius  ratio  of  100  was  numerically 
modeled  and  excited  with  a broadside  incident  Gaussian  pulse.  The  backseat - 
tered  electric  field  was  calculated  as  a function  of  time  for  uniform  restive 
loading  of  0,  125,  250,  500,  750  and  1683  ohms  per  meter.  These  fields  are 
shown  in  Figure  1.  The  loading  of  1683  fi/m  was  chosen  because  Tesche[17] 
calculated  that  at  this  value  the  dipole  becomes  critically  damped,  giving  a 
double  pole  on  the  negative  real  axis.  The  trajectories  of  the  first  seven 
even  poles  extracted  using  Prony's  method  are  shown  in  Figure  2.  Note  that 
for  the  value  of  1683  ft/m  the  first  pole  has  split  and  moved  toward  the 
origin  and  toward  infinity,  which  indicates  that  this  value  of  loading  gives 
an  overdamped  situation.  This  does  not  imply  that  Tesche's  value  is  wrong 
but  points  out  the  differences  in  our  numerical  models  for  the  dipole. 


This  trajectory  plot  is  very  illuminating  in  that  it  shows  the  effect  of 
resistance  on  an  antenna  in  terms  of  the  natural  resonances.  This  example 
also  shows  that  accurate  poles  can  be  obtained  from  transient  data  that  does 
not  look  like  damped  sinusoidal  data.  This  example  shows  the  ideal  kind  of 
results  which  one  would  hope  to  get  from  Prony's  method.  The  results  not 
only  give  useful  parameters  for  the  waveforms,  they  also  tell  us  something 
about  the  physical  characteristics  of  the  antenna. 


Example  2 — From  References  [17]  and  [18] 


In  this  example  a pulse  driving  function.  Figure  3a,  was  used  to  illumi- 
nate a dipole  in  a corner  reflector  and  the  terminal  voltage  of  the  antenna 
was  measured  as  a function  of  time.  Figure  3b.  The  measurements  were  taken 
on  the  HDL  transient  antenna  range[19].  Prony's  method  was  used  to  fit  the 
driving  function  and  the  response  functions  with  the  results  shown  in  Figures 
3a  and  3b,  respectively.  The  closed  form  expression  for  the  transfer  function 
of  the  antenna  was  obtained  analytically  by  evaluating 


H(s) 


A. 

l 

S-S. 

l 


(14) 


where  A^  and  are  the  residues  and  poles  of  the  response  voltage  and 

Bj  and  Sj  are  the  residues  and  poles  of  the  driving  waveform.  The  inverse 
Laplace  transform  of  Equation  (14)  was  taken  to  give  the  impulse  response 
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H(t)  which  is  plotted  in  Figure  3c.  Also  in  Figure  3c  is  the  impulse  res- 
ponse obtained  by  taking  a conventional  FFT  of  both  the  driving  functions 
and  the  response  functions,  dividing  the  latter  by  the  former,  and  then 
Fourier  transforming  back  to  the  time  domain. 

This  example  shows  that  Prony's  method  can  be  used  to  obtain  a closed 
form  analytical  expression  of  the  transfer  function  of  a structure  from 
experimental  data.  The  poles  in  the  resulting  transfer  function  should  then 
be  the  true  poles  of  the  system.  The  poles  are  not  plotted  here  because 
there  is  not  previous  data  to  compare  them  with.  In  this  example  Prony's 
method  was  used  basically  as  a curve  fitting  scheme  to  give  analytical  expres- 
sions for  the  transient  response  of  the  measured  data.  Since  the  curve  fit 
was  in  terms  of  complex  exponentials  the  analytical  Laplace  transforms  and 
deconvolution  could  be  performed  as  in  Equation  (14) . If  a constrained 
Prony's  method  had  been  used,  then  the  numerator  poles  of  Equation  (14)  would 
contain  the  poles  of  the  denominator  and  division  would  be  easy.  A con- 
strained Prony's  method  has  been  developed  by  this  author  but  has  not  yet 
been  implemented. 

Example  3 — From  Reference  [20] 

This  example  is  taken  from  an  MRC  report [20]  which  contains  several 
similar  examples  and  should  be  referred  to  for  more  detail.  The  data  used 
is  the  transient  waveform  measured  by  a D sensor  on  the  central  body  of  an 
electrical  mockup  of  the  FLTSATCOM  satellite.  The  mockup  was  excited  by 
capacitive  drive  current  injection  to  determine  the  electrical  characteristics 
of  a satellite  when  excited  by  SGEMP.  Figure  4a  shows  the  actual  D sensor 
data  which  was  used  with  the  Prony  algorithm  to  produce  the  curve  fit  shown 

in  Figure  4b.  From  this  Prony  fit  the  data  was  extrapolated  to  a later  time. 

Figure  4c,  than  was  available  in  the  original  measurement.  Note  the  low  fre- 
quency ringing  which  is  now  apparent  but  which  did  not  rear  its  head  in  the 
original  data  of  Figure  4a.  Since  the  data  was  D data,  the  charge  density 
on  the  body  could  be  obtained  by  integrating  the  analytical  expression  for 
the  response.  The  result  of  this  integration  is  shown  in  Figure  4d.  Finally, 
the  Laplace  transform  of  the  D data  obtained  from  the  pole  data  is  shown  in 

Figures  4e  and  4f.  Note  the  dominate  resonance  which  appears  at  about 

10  megahertz. 

This  example  shows  that  Prony's  method  can  be  applied  to  raw  data  to 
obtain  a late  time  extrapolation  of  the  response  not  obtainable  from  the 
measurement.  It  is  also  possible  to  perform  an  analytical  integration  on 
the  data  and  analytical  Laplace  transform  to  obtain  the  spectral  characteris- 
tics of  the  response.  Note  that  because  of  the  truncated  data  record  it 
would  have  been  very  difficult  to  obtain  an  FFT  of  the  data. 
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Conclusions 


This  paper  has  briefly  given  an  historical  survey  of  the  application  of 
Prony's  method  to  electromagnetics  response  data.  The  method  itself  was  pre- 
sented and  some  of  the  numerical  difficulties  which  are  encountered  when  noise 
is  present  were  discussed  with  suggestions  for  remedying  these  problems. 
Finally,  three  examples  of  the  application  of  Prony's  method  to  data  for 
information  extraction  were  presented.  This  paper  has  actually  only  skimmed 
the  surface  of  the  method  and  its  application  and  problems  and  is  intended 
as  an  introduction  to  the  method.  The  reader  is  encouraged  to  study  the 
references  listed. 

It  is  the  author's  opinion  that  Prony's  method  is  a very  powerful  tool 
for  data  analysis  and  parameter  estimation  when  skillfully  employed.  This 
has  been  shown  convincingly  in  the  previous  three  examples.  There  are  still 
many  unanswered  questions  which  need  to  be  studied,  many  of  which  relate  to 
the  noise  problems  indicated  in  The  Noise  Problem  section.  More  answers 
about  the  method  have  been  uncovered  in  trudging  ahead  and  using  it  than  have 
been  answered  by  studying  the  problem  and  waiting  to  apply  it  after  all  the 
problems  have  disappeared.  The  games  of  parameter  estimation  and  data 
analysis  are  not  easy  ones,  and  we  must  make  use  of  all  available  tools  and 
learn  to  use  the  tools  through  application  and  experimentation. 
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Figures 


Figure  1.  Backscattered  fields  for  a 1 m dipole  with 

resistive  loading.  At  = 6.9444  x 10-11  seconds 
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Figure  2.  Trajectory  of  the  poles  for  the  uniformly  resistive 
loaded  1.0  m dipole.  Note  the  poles,  = a + ju>, 
are  normalized  by  L = 1.0  meter,  c = 3 x 108  m/s, 
and  tt. 
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Figure  3.  Data  from  the  HDL  transient  antenna  range  for  a dipole 
in  a corner  reflector.  The  data  was  operated  on  by 
Prony's  method. 

(a)  Original  driving  waveform  and  Prony  fit. 

(b)  Original  measured  response  voltage  and  Prony  fit. 

(c)  Calculated  impulse  response  of  the  antenna. 
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Figure  4 


(e)  (f) 

. (a)  D data  from  a FLTSATCOM  electrical  test;  (b)  operated 

on  by  Prony's  method  to  obtain  a curve  fit;  (c)  an  extra- 
polation; (d)  analytical  integration;  and,  (e)  § (f)  the 
Laplace  transform. 
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Abstract 

Input-output  responses  of  a network  can  be  integrated  to  yield  a 
family  of  signals,  called  measurement  signals.  Application  of  the  pencil- 
of-functions  theorem  to  this  family  yields,  in  a closed  form,  the  identified 
parameters  of  the  network  function.  This  method  of  black-box  modeling 
is  suitable  for  systems  where  the  responses  can  be  integrated  in  real  time, 
or  recorded  in  digital  form  for  off-line  processing. 


INTRODUCTION 

Determining  the  model  of  a network  from  its  observed  input-output 
responses  represents  the  inverse  of  the  analysis  problem.  Interest  in  this 
arises  from  the  frequent  need  for  a relatively  simple  mathematical  description 
of  the  system  so  that  behavior  for  other  anticipated  inputs  may  be  predicted 
up  to  acceptable  accuracies.  Like  the  analysis  problem,  there  are  several 
approaches  available  in  the  literature  for  the  inverse,  or,  as  it  is  often 
called,  the  "identification"  problem  [1].  To  name  a few,  a)  Prony's  method 
[2],  b)  gradient  methods,  such  as  Newton  [3]  and  quasi-linearization  [4], 
c)  least-squares  and  generalized  least-squares  methods  [5], [6],  d)  maximum- 
likelihood  methods  [7], [8],  etc. 

All  of  the  methods  stated  above  possess  certain  advantages  and,  as 
may  be  expected,  certain  disadvantages  peculiar  to  each  particular  method. 


Stated  very  broadly,  sensitivity  to  noise,  slow  convergence  to  the  solution, 
and  excessive  computational  complexity  are  some  of  the  possible  disadvantages. 
The  purpose  of  the  present  paper  is  to  describe  in  a simple  way  the  identi- 
fication method  developed  in  reference  [9].  The  method  offers  the  advantages 
of  mathematical  simplicity,  closed-form  solution  to  the  problem,  which  is 
optimal  in  the  generalized  least-squares  sense  and  suboptima 1 in  the  strict 
least-squares  sense,  and  relative  robustness  of  the  technique  to  noise. 

The  disadvantage  of  the  method  is  that  unlike  the  maximum  likelihood  method, 
the  variances  of  additive  noise,  when  present,  must  be  known  a priori 
(since  they  are  not  estimated  in  the  present  method)  in  order  that  unbiased 
parameter  estimates  may  be  computed. 

As  stated  above,  we  now  set  out  to  describe  the  penci 1-of-functions 
method,  without  any  proofs,  and  to  illustrate  it  with  some  examples. 

Discrete- time  signals  are  chosen  for  the  presentation  here,  although  such 
signals  must  often  be  obtained  by  sampling  a continuous-time  system. 

PENCIL  OF  FUNCTIONS  METHOD 


Identification  Problem 

Given  the  input-output  observations 

{u( k)},  {y( k)},  k=0,l , . . >K  (1) 

arising  from  a physical  system  believed  to  be  linear,  finite  order,  it  is 
desired  to  find  a system  model 
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which  best  fits  the  observations,  in  some  sense.  A solution  can  be  obtained 
by  use  of  the  pencil-of-functions  theorem  as  discussed  below. 

For  convenience  denote  sequences  { u ( k ) } and  (y(k)}  simply  as  u and 
y,  respectively.  Also,  denote  the  inner-product  of  two  sequences  as 
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x • y 


def 


(4) 


K 

= l x(k)  y(k) 
k=0 


Measurement  Sequences 

From  the  given  sequences  y and  u we  form  the  following  set  of 
sequences,  called  measurement  sequences: 


y^k)  = 

y(k) 

y2(k)  = 

y1(0)+y1(l)+  .... 

+y1  (k ) 

n+1^ 

yn(o)+yn(D+  .... 

*y„(k] 

(5) 

Uj (k)  = 

u(k-l) 

u2(k)  = 

u1(0)+u1(l)+  

+u-|  (k') 

n+1^  = 

• • • < « • 

un(0)+un(l)+  .... 

tu„<k> 

(6) 

where  n is  the  order  of  the  model  desired.  That  is,  n is  the  degree  of  the 
network  function  H. 

Note  that  these  sequences  represent  repeated  discrete  integrations  of 
the  observed  signals  y(k)  and  u(k),  respectively.  A schematic  representation 
is  given  in  Fig.  1 where  we  assume  the  additive  noise  to  be  zero,  so  that 
x(k)=y(k)  and  v(k)=u(k).  Correction  for  noise  is  discussed  in  [9]. 

Gram  Matrix 


Next  form  the  following  inner-product  matrix 
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where 

we  have 

used  the 

notation 

N=n+1  for  convenience. 

This  (N+n)  x (N+n) 

dimensional  matrix  is  the  Gram  matrix  [10]  of  the  (N+n)  dimensional  vector 
sequence 

{ f ( k ) } , k=0,l . . , K (8) 


where 


y^k) 

y200 


f(k) 


yN(k) 

u2(k) 

UN(k) 


To  state  this  observation  formally, 
K 


F = i f(k)fT(k) 

k=0 


we  have 


(9) 


(10) 
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Diagonal  Cofactors 

Denote  the  diagonal  cofactors  of  F as  Di : 

Di  = i,i  cofactor  of  F (11) 

Recall  that  the  i,i  cofactor  of  a square  matrix  is  the  determinant  of  the 
matrix  after  deleting  the  ith  row  and  the  ith  column. 

Parameter  of  the  Network  Function 


The  parameters  of  the  network  function  are  given  by  the  square-roots 
of  D.  up  to  a multiplicative  constant.  That  is 

N i_i  " 

[ l ✓Di  (1  - z'1)  ■]  Y(z)  = [ l /DN+i  z"1  (1  - z"1)1'1]  U(z) 


1*1 


1*1 


(12) 

which  can  be  normalized,  by  dividing  by  D=/D,+. . .+/D..,  so  that  the 

leading  coefficient  becomes  unity.  Clearly  the  computed  transfer  function 

becomes 


H(z) 


l n -1  i_1 

2 [ ? ^ N+i(1_Z  } ] /D 

N _i  i-l i /n 

[ l SO.  (1-z  ])  ] /D 

i=l  1 


(13) 


NUMERICAL  EXAMPLE 

Results  of  computer  simulation  on  a fourth  order  network  function  are 
presented. 

Example  1 . 

The  network  function  considered  is 


[s2  + 0.31 ( 1 06 ) s + 0.003(10)12] 

[s4  + 0.804(106)s3  + 1.4481(1012)s2  + 0.009686(1018)s 

+ 0.007056(1024)  ] 
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fs  + 104irs  + 0. 3( 106) 1 

[s2  + 0.004(106)s  + 0.0049(1012)]  [s2  + 0.8(10)6s  + 1.44(10)12] 

s-poles : (-0.002+  j 0.0699714) ( 1 06 ) 

(-0.400  + j 1.131371  )(106) 


It  was  converted  to  a digital  equivalent  form  (using  pole-zero  z = Exp.(sT) 
transform  [11])  for  computer  simulation.  With  a sampling  interval  A=0.5  ps. 
the  z-domain  transfer  function  turns  out  to  be 


H(z)  = 2.00z~2  -3. 71 14409z~3  + 1.7128304z"4 

1 - 3.379158z_1  + 4.428628z~2  -2.718099z-3  + 0.6689807z‘4 

The  system  was  excited  was  excited  by  a +-square  5 ps  pulse  (see  Fig.  2a). 

The  model  identified  by  the  proposed  method  is 

q,7x  _ 2.00z~2  -3.  77  7 5Qz~ 3 + I.7I28z~4 

H Z -1  -?  -3  -4 

1 - 3.3792Z  1 + 4.4286Z  z - 2. 7181 z J + 0.66898z  H 

s-poles:  (-0.002+  j 0 . 069971 4 )( 1 06 ) 

(-0.399  + j 1.131373  )(106) 

Using  the  inverse  of  the  pole-zero  transform,  the  s-domain  transfer 
function  can  be  obtained.  The  poles  turn  out  as  shown  above. 

The  response  of  the  model  and  the  actual  network  response  are  compared 
in  Fig.  2b. 

Example  2 

The  output  of  the  network  in  Example  1 was  corrupted  by  zero  mean 
white  noise  such  that  the  signal  to  noise  ratio  was  28  dB  (see  Fig.  3a). 
Ignoring  the  presence  of  noise  the  pencil  of  functions  method  was  applied. 
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ignoring  the  presence  of  noise.  The  model  identified  is 


= 2.0411z~2  - 3. 788Qz~3  + 1.7485z~4 

1 - 3.60440Z-1  + 5.1082z'2  -3.4019z‘3  + 0.89852z"4 

s-poles : (-0.0011  + j 0.0702220) (105 6) 

(-0.1059  + j 1.120712  )(106) 

The  responses  of  the  model  and  the  actual  network  response  are  compared  in 
Fig.  3b.  It  is  interesting  to  note  that  the  imaginary  parts  of  the  poles  have 
been  identified  quite  accurately.  However,  the  general  degradation  in  the 
values  of  parameter  estimates  is  obvious. 

Two  alternatives  are  available:  a)  we  can  accept  the  above  model  in- 
asmuch as  the  above  model  behaves  close  to  the  original  model,  relative  to  the 
test  input(s)  used,  or  b)  we  may  use  noise  corrections  as  described  in  [9]  to 
obtain  parameter  estimates  much  closer  to  the  original.  The  results  of  the 
second  approach,  although  not  presented  here,  do  turn  out  close  to  the  origi- 
nal model. 
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TWO  DIMENSIONAL  SPECTRAL  ESTIMATION* 
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ABSTRACT 

The  problem  of  maximum  entropy  spectral  estimation  in  two  dimensions  is 
considered.  Unlike  the  one  dimensional  case,  the  two  dimensional  maximum 
entropy  spectrum  cannot  be  obtained  by  solving  two  dimensional  linear  equa- 
tions of  autoregression.  An  iterative  algorithm  is  therefore  considered. 
Examples  on  fields  whose  spectra  is  composed  of  discrete  lines  or  is  irra- 
tional are  considered. 


INTRODUCTION 

Two  dimensional  spectral  estimation  is  the  problem  of  estimating  the 
spectral  density  function  of  a random  field  whose  covariances  (or  autocorre- 
lations) are  given  on  a finite  two  dimensional  window.  Let  fu  , } denote  a 
zero  mean,  stationary,  Gaussian  random  field  whose  covariance  fdinction  is 
defined  as 


r(m,n) 


Eu.  , u . 

i,j  l+m, j+n 


(D 


Suppose  r(m,n)  is  given  on  a window  W=  fm,ni  -p  ^ m £ p,  -q  <.  n £ q).  The 
spectral  density  S(f.,f_)  of  fu.  } is  related  to  r(m,n)  by 


*Presented  at  the  Workshop  on  Spectral  Estimation,  held  at  Rome  Air  Develop- 
ment Center,  Griffis  Air  Base,  Rome,  NY,  May,  1978.  Research  supported  in 
part  by  Rome  Air  Development  Center,  Contract  F30602-75-C-0122  and  Army  Re- 
search Office  , Research  Triangle  Park,  North  Carolina,  under  research 
grant  No.  DAAG29-77-G0044 . 
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j2n(mfl  + nf2) 


(2) 


L 

:(m,n)  = ^ S(f1>f2)e 


dfldf2 


MAXIMUM  ENTROPY  ESTIMATION 


The  maximum  entropy  estimate  maximizes 


A 

H=\  XnS(f1,f2)df1df2 


(3) 


under  the  linear  constraints  of  (2)  for  m,n  eW  . The  result  requires  that 
the  Fourier  series  of  S be  truncated  to  terms  within  the  window  W i.e., 


s(f,,£,).62[  £ . e-J2n<mfl  + nf2>] 

1 l L *7  ^ J 

m,neW 


-I 


(4) 


Thus,  one  has  to  solve  for  a given  r . In  one  dimension  this  problem 

m.n  m,n 

is  equivalent  to  solving  a set  of  linear  equations  of  autoregression.  How- 
ever, this  is  not  the  case  in  two  dimensions.  Suppose  we  introduce  a causal 
autoregressive  representation 


u . , = 

1 > J 


p q 

£ £ <y  u.  . + e . . 

m=0  n=0  m,n  L>J 

(m,n)^(0,0) 


(5) 


where  Ee.  . = 0,  Ee . .e . . .,  = B 6 „6  r, 

i,j  i,j  i+m,  j-in  m,0  n,0 


(6) 


i.e.,  fe.  . } is  a white  noise  field.  Then,  given  r for  m,ne  W,  one  can 
( i j j in  j n 

find  0/  , m,ne  W,  , W,  = fm,n;  0 £ m ^ p,  0 ^ n ^ q]  by  solving  a set  of 


. m.n'  ' \ 

linear  equations 


= -B2i  , P2  = l / C05Ll3l>1 


(7) 


where  is  a (p+l)  x (p+1)  block  Toeplitz  covariance  matrix  of  basic  dimen- 
sion (q+1),  o is  a vector  of  unknown  l with  O'--  A -l,  and  l is  a unit 

1 m , n UU 
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vector  with  1 in  the  first  Location  and  zeros  eisewhere.  Solution  of  (7) 
for  a does  not  ensure  a stable  representation  of  (5)  even  if  (^_  is  positive 
definite.  Moreover,  if  the  solution  of  (7)  leads  to  a stable  representation 
of  (5),  the  covariances  realized  by  (5)  (for  finite  p,q)  over  the  window  W 
need  not  be  the  same  as  {r(m,n)j  over  W. 

The  spectral  density  realized  by  (5)  is  given  by 


su(fi.f2)  = 0 


m,  ne  W 


a e 
r m , n 


-j^m^  + nf  2> 


*This  statement  contradicts  the  implications  in  Tl]  where  the  solution  of  (2) 
and  (4)  is  proposed  via  (7)  to  (9). 
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where  R is  simply  the  (p+l)  x (p+l),  positive  definite  Toeplitz  matrix, 
(r(m-n)j.  Here,  the  positivity  of  R is  enough  to  ensure  stability  of  (11) 
and  to  yield  from  (11)  the  same  covariances  as  the  given  fr(m)}  for 
-p  5 m £ p. 


ITERATIVE  SOLUTION 

To  obtain  the  spectrum  S(f  ,f  ) we  have  to  solve  the  nonlinear  equations 
(2)  and  (4)  for  m,ne  W.  To  solve  this  problem  we  use  an  extension  of  an 
algorithm  proposed  by  Censor  et  al;  \_2]  addressed  to  maximizing  l In  such 
that  x > 0 and  Ax  = b where  A is  an  m x n matrix.  The  solution  is  found  to 
be 


x 

j 


.u. 
J i 


(13) 


where  u is  the  dual  vec.tor  of  x. 

The  iteration  algorithm  cyclically  varies  the  co-ordinates  of  u so  that 
only  one  component  is  changed  and  the  dual  functional  decreases  at  each  it- 
eration step. 

In  our  problem  then,  we  have 


S(k,Jt) 


E E 

m,ne  W 


m,n  N N 


- 1 2tt 

„ _ N 
W = p 
N 


(14) 


where  we  could  consider  am  ^ as  our  dual  variables. 

In  the  algorithm  we  cyclically  vary  m and  n over  the  window  W so  that 
only  one  component  a.  . is  changed  at  each  iteration. 

At  the  (q+l)  iteration,  where  we  change  a 


l 


E E 

m,  ne 


iq+l 

m,n 


(mk  + of) 
N 


l 


E E a 
m,ne  W 
(m,n)^(i , 


q (mk  + ni.) 
m,n  N 

j) 


+ a 


q+ljik  + Ji) 

ij  N 
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1 


(15) 


, ^»k  + <d  + (ik  + Jir.q+i  . 1 

m,ne  W N N L lJ  IjJ 


Hence, 


,q+i 


kj 


-l  ~ 1+  s’  ,[a’+1  - a’.  ] 

k , j^L  ij  lj  J N N 


The  value  of  a*?"!"'-  - a?,  is  calculated  from  the  constraint 
ij  J-J 


(16) 


1 a+1  -ik  -i£ 

f(z)  = r(i,j)--2E  E sj  ] WN  =0 

N k £ 


(17) 


This  equation  can  be  solved  for  2 = 3*?.  - a*? . in  a few  iterationsof  the 

Newton- Raphson  technique.  The  function  f(z)LAas  many  zeros  and  it  must  be 
insured  that  we  pick  that  value  of  z that  satisifes 

l + s’  aW.J.V^Z  > 0 

k,£  N N 

The  equations  involved  are  all  reduced  to  real  equations  by  symmetry  proper- 
ties i.e.  , 


a = a ; a = a 
mn  -m-n  m,-n  -m,n 


and 


S,  „ = S . and  S . . = S.  . 

k,£  -k ,-i  -k,£  k,-£ 


EXAMPLE 

We  considered  the  following  covariance  model  [_3] 

r(k,£)  = cos2rT(0.05k  + 0.2£)  + 0.5  cos2rr(0.2k  + 0.05£)+  0.26(k,£) 
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Figure  1 shows  the  estimated  spectra  where  the  larger  peak  has  an 
amplitude  of  35dB  and  the  smaller  an  amplitude  of  8 dB.  Other 
examples  and  numerical  tradeoffs  of  the  proposed  algorithm  will  be 
presented  at  the  conference. 
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Abstract 

The  lattice  structure  offers  a convenient  visual  realization 
of  the  Levinson  recursion.  Based  on  the  lattice,  a number  of 
methods  have  been  developed  recently  for  all-pole  or 
autoregressive  spectral  estimation.  Of  particular  importance  are 
methods  that  do  not  require  windowing  of  the  data,  especially  for 
short  data  lengths.  This  paper  focuses  on  the  importance  of  the 
lattice  as  a tool  in  spectral  estimation.  Various  lattice 
methods  are  presented  and  compared,  including  adaptive 
estimation . 

1 . introduction 

All-pole  or  autoregressive  spectral  estimation,  which  is  a 
special  case  of  linear  predictiorv'analysis,  has  become  popular  as 
a simple,  effective  method  of  spectral  estimation.  In  this 
paper,  we  first  review  the  two  basic  methods  of  linear  prediction 
analysis,  the  autocorrelation  and  covariance  methods  [1],  where 
the  first  assumes  that  the  signal  is  windowed  (i.e.,  the  signal 
is  zero  outside  the  range  of  interest) , and  the  second  makes  no 
assumptions  about  the  signal  outside  the  range  of  interest. 
Then,  we  present  the  lattice  structure  and  the  various  methods 
that  can  be  derived  from  it.  We  shall  see  that  the  methods  of 
Itakura  [2]  and  Burg  [3]  are  special  cases  of  the  general  class 
of  methods  to  be  presented.  Cases  where  the  latter  two  methods 
give  incorrect  results  will  be  given,  as  well  as  suggestions  of 
which  methods  might  be  more  appropriate. 

In  the  sequel  we  shall  assume  that  the  signal  spectrum  is  to 
be  modelled  by  an  all-pole  cransfer  function 


H(z)  = G/A(  z) 


(1) 


where 


A (z) 


1 + 


P 

E a (k) z 
k=l 


(2) 


is  the  "inverse  filter"  and  G is  a gain  factor.  Whenever  the 
word  "stable"  is  used  below,  it  shall  refer  to  the  stability  of 
the  filter  H(z),  implying  that  H(z)  and  A(z)  are  minimum-phase, 
and  that  the  poles  of  H(z)  (and  zeros  of  A(z))  are  inside  the 
unit  circle. 
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2.  Linear  Prediction 


In  all-pole  linear  prediction  analysis  we  assume  that  the 
signal  x (n)  , 0<rKN-l , can  be  approximated  as  a weighted  linear 
summation  o£  past  samples: 

- P 

x(n)*x(n)  = - £ a(k)x(n-k)  (3) 

k=l 

where  a ( k)  are  the  predictor  coefficients  and  p is  the  order  of 
the  predictor.  The  coefficients  a(k)  are  computed  as  the  result 
of  minimizing  the  total  squared  value  of  the  error 
e (n)  =x  (n)  -x  (n)  : 

2 P 2 

E=£  ez  (n)  = £ [x (n)  + Z a (k)x(n-k) ] . (4) 

n n k=l 

There  are  two  cases  of  interest. 

a)  Windowed  Case 


Here  we 

assume  that  x(n)=0  for  n<0  and  n>N; 

then 

minimization  of 

(4)  results  in  the  normal  equations  of 

the 

"autocorrelation 

method" : 

E a (k) R(i-k)  = -R(i) , lsi<p, 
k=l 

* p 

(5) 

and 

E = R(0)  + E a (k)  R(k) 

P k=l 

(6) 

where 

N-l 

R(i)  = E x (n)  x (n- 1 i | ) 
n=  1 i 1 

(7) 

£ 

is  the  signal  autocorrelation,  and  E_  is  the  minimum  total  error 
(or  residual) . (The  shape  of  th§  window  to  be  used  in 
multiplying  the  signal  is  of  importance,  but  will  not  be  discused 
here .) 


The  autocorrelation  matrix  { R ( i— k ) } in  (5)  is  Toeplitz,  and 
the  set  of  equations  may  be  solved  using  the  algorithm  developed 
by  Levinson,  Robinson  and  Durbin  [1] . We  shall  simply  refer  to 
it  as  the  Levinson  recursion: 


EQ  = R ( 0 ) 


m-1 


K =-[R(m)  + E a . (k)R(m-k) J/Em  . 
m k_i  m-i  m-l 

a (m)  = K 
m m 

a (k)  = a . (k)  + K a (m-k) , l<k<m-l| 
m m-l  m m-l  J 

it  ? * 

E - (1  - K'" ) E 
m m m-l 


(8a) 

(8b) 

(8c) 

(8d) 
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Equations  (8b) -(8d)  are  solved  recursively  for  m=l,2,...,p.  The 
final  solution  is  given  by  a ( k)  =ap  ( k)  , l<_k<_p.  Note  that  in 
obtaining  the  solution  for  order  p,  one  computes  the  solutions 
for  all  predictors  of  order  less  than  p.  We  shall  see  below  how 
(8)  is  implemented  in  lattice  form.  In  particular,  the 
reflection  or  partial-correlation  coefficients  will  be  the 

parameters  of  the  lattice.  One  can  show  that  tne  following 
condition  for  Km  in  (8b)  holds: 

I Km I <1 , l<m<p,  (9) 

which  guarantees  the  stability  of  the  filter  H(z). 
b)  Unwindowed  Case 

Here  we  make  no  assumptions  about  the  signal  outside  the 
given  range.  From  (4)  one  can  see  that  in  order  not  to  need  data 
outside  the  given  interval,  the  summation  should  go  from  n=p  to 
n=N-l.  The  minimization  of  (4)  results  in  the  normal  equations 
of  the  "covariance  method": 

P 

E a ( k) R ( k , i)  = -R (0 , i)  , l<i<p,  (10) 

k=l  p 

and  E*  = R (0 , 0)  + k£x  a ( k)  R (0  , k)  , (11) 

N-l 

where  R(k,i)  = z x(n-k)x(n-i)  (12) 

n=p 

is  the  covariance  of  the  signal.  The  covariance  matrix  (R(k,i)} 
is  not  Toeplitz  and  so  (10)  cannot  be  solved  using  the  Levinson 
recursion.  Note  that  the  covariance  method  is  reminiscent  of 
Prony's  method. 

Comparison 

In  the  autocorrelation  (windowed)  method,  the  resulting 
all-pole  filter  is  stable,  at  a cost  of  windowing  of  the  signal, 
which  results  in  a loss  of  frequency  resolution,  especially  for 
small  N.  On  the  other  hand,  the  covariance  (unwindowed)  method 
has  the  advantage  of  no  windowing,  but  filter  stability  is  not 
guaranteed,  especially  for  small  N.  We  now  present  the  lattice 
methods  and  see  how  some  of  these  problems  may  be  alleviated. 

3.  Lattice  Formulation 

The  lattice  was  developed  by  itakura  [2]  for  use  in  the 
analysis  and  synthesis  of  speech.  The  development  here  follows 
that  of  the  author  [4].  Fig.  1 shows  the  basic  lattice  used  in 
the  analysis.  From  Fig.  1,  the  following  relations  hold: 

f 0 (n)  = g0  (n)  = x (n)  (13a) 
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£m‘">  - <13b> 

Vn>  * Km£n,-l<n)  + Vl'"-1'  <13c) 

where  f m ( n ) is  the  "forward"  residual  at  stage  m,  and  g (n)  is 
the  "backward"  residual.  Taking  the  z-transform  of  (13),  one 
obtains  a similar  recursion  with  Fm(z)  and  Gm(z)  as  the 
z-transforms  of  f m ( n ) and  gm(n),  respectively.  Defining  the 
transfer  functions  to  stage  m as 

Am(z)  = Fm(z)/X(z),  Bm(z)  = Gm(z)/X(z)  (14) 

one  obtains  the  following  recursive  relations: 

A0(z)  = B0(z)  = 1 (15a) 

Am(z)  = Am_1(z)  + Krnz-iBm_1  ( z)  (15b) 

Vz>  - Wl*2)  + z'lBm-l<z)-  <15c> 

Let  the  forward  transfer  function  at  stage  m be  given  by 

m -k 

A^  ( z)  = £ am(k)z  K,  (16a) 

k=0 

then  from  (15)  one  can  show  that  Bm(z-)  will  be  the  corresponding 
reverse  polynomial: 

Bm(z)  = z-^fz-1)  = ? am(m-k)z_k.  (16b) 

k-0 

From  (15)  and  (16)  we  also  have: 

am(0)  = 1,  am(m)  = Km.  (17) 

By  comparing  (15)  and  (17)  with  (8c),  we  see  that  (15)  is  the 
heart  of  the  Levinson  recursion. 

Given  the  polynomial  A_(z),  with  a_(0)=l,  one  can  generate 
all  the  polynomials  Am(z),  m<p,  and  the  coefficients  Km,  using 
the  following  reverse  recursion  derived  from  (15): 

\i-l<z>  " IAm<z>-KroVzH/(l-K2)  (18) 

along  with  (16b) . Ap(z)  and  Km,l<m<p,  are  uniquely  related  iff 
Ap(z)  is  minimum  phase  and,  hence,  (9)  is  true.  In  that  case. 
Bp ( z)  is  maximum  phase,  as  one  can  see  from  (16b). 

The  all-pole  modelling  problem  can  now  be  stated  as  the 
minimization  of  the  output  residual  energy  with  respect  to  the 
reflection  coefficients  1^.  One  can  then  use  the  recursion  (15) 
to  determine  the  predictor  coefficients.  Again,  here,  we  have 
two  cases:  windowed  and  unwindowed. 
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4.  Windowed  Case 


For  the  case  where  the  data  is  stationary  or  windowed,  one 
can  show  that,  at  each  stage,  the  energy  of  the  forward  residual 
is  equal  to  the  backward  residual  energy: 

Em  = f2(n)  = g^j  (n)  > for  all  m,  (19) 

where  the  overbar  denotes  summation  over  n,  in  this  case  over  all 
time.  From  (13b)  and  (19),  one  can  show  that 

Em  = Em-1  + 2Kmfm_i(n)gm_i(n-:L)  + K^.j.  (20) 


By  defining 


'm-1 


Vi(B,Vi(n-11 


Jm-1 


we  have 

“ H + 2rm-lKm  * Km>Em-l' 
From  (19)  , one  can  rewrite  (21)  as 


'm-1 


fm-llnl  'Vl(n~11 


(21) 


(22) 


(23) 


which  is  the  correlation  coefficient  between  f ^ (n)  and 
gm_i(n-l)»  the  inputs  to  stage  m.  Therefore,  (22)  gives  the 
energy  at  stage  m as  a function  of  the  energy  at  stage  m-1,  the 
correlation  coefficient  rm_^,  and  Km. 


For  p>l,  the  residuals  are  fairly  complicated  functions  of 
Km,l<m<p,  and  the  minimization  of  Ep  leads  to  a nonlinear 
minimization  problem.  However,  for  ya  windowed  or  stationary 
signal,  the  pth-order  global  nonlinear  minimization  problem  can 
be  solved  as  a sequence  of  lst-order  local  minimization  problems 
at  each  stage.  We  shall  call  this  major  property  as  the  lattice 
minimization  property,  or  simply  LM  property.  In  the 
implementation,  if  we  assume  that  Em_^  is  minimized  as  Em_i,  then 
the  minimization  of  Em  in  (22)  is  accomplished  by  differentiating 
with  respect  to  Km  to  obtain  the  optimal  value 


and 


Jm 


Km  -rm-l  • 

= (l-K^JE*.!. 


(24) 

(25) 


Equation  (24)  is  the  reason  why  { K^}  are  knoyn  as  garjial 
correlation  coefficients.  Since  lrTn_illl»  so  i-s  a°d  Efn<_En|_ -j_ . 

Equation  (25)  is  identical  to  (8d)  , and  (24)  with  (21)  can  Be 
shown  to  be  identical  to  (8b) . 
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that  the 


As  a result  of  the  minimization,  one  can  show 
backward  residuals  become  orthogonal  to  each  other: 

gi(n) (n)  = E*  <5^.  (26) 

Other  correlation  properties  are  given  in  [4]. 

Equation  (26)  is  a very  important  property,  which  has  been  very 
useful  in  adaptive  lattice  estimation  (see  Section  7) . 

5 . Unwindowed  Case  - Suboptimal  Solutions 

We  have  seen,  in  the  windowed  case,  that  the  LM  property 
allowed  us  a recursive  computation  of  Km.  Unfortunately,  the  LM 
property  is  not  true  if  the  signal  is  not  windowed.  The 
nonlinear  minimization  problem  cannot  be  replaced  by  a sequence 
of  one-stage  minimizations.  However,  one  can  obtain  a suboptimal 
solution  by  pretending  that  the  LM  property  does  in  fact  hold. 
Due  to  the  suboptimality  of  such  a solution,  one  can  define  a 
large  number  of  suboptimal  solutions,  as  we  shall  see  below. 


Two  reasonable  methods  are  to  minimize  either  the  forward  or 
backward  residual  energy  at  each  stage.  Define 


Fm  - 

Gm  ' 9m(n> 


(27a) 

(27b) 


where  the  overbar  now  denotes  summation 
interval.  By  substituting  (13b)  in  (27a) 
differentiating  with  respect  to  K 
solutions : 


m 1 


we 


over  a fixed  finite 
and  (13c)  in  (27b)  and 
obtain  two  different 


K = - 
m 


f i (n> 
m-1 


gm-l (n_1) 


g_  . (n-l) 
m-1 


(28a) 


fm-l(n)  gm-l (n-1) 


fm  i <n> 
m-l 


It  is  clear  that  and  Kg  have  the  same  sign  S: 


(28b) 


S = sign  Kf  = sign  Kg . (29) 

One  important  fact  is  that  and  Kg  as  defined  in  (28)  need  not 
obey  (9).  However,  oi>e  can  also  show  that  if  the  magnitude  of 
either  of  them  is  greater  than  1,  the  magnitude  of  the  other  is 
necessarily  less  than  1 [5]: 


164 


or 


(30) 


If  | Kf I > 1 , then  |Kg<l 
if  |Kgl>l,  then  | K f I < 1 . 

Now,  define  the  generalized  rth  mean  between  and  Kg : 

Kr  = S[^(lKf|r  + |Kg|r)]1/r.  r 31) 

One  can  show  that  [5] 


|Kr I <1,  iff  r <0 . 


(32) 


In  particular,  as  r— >0,  we  obtain  the  geometric  mean 


° _ K1  _ _ f.-li,"Vlln'11 

m m 


An-  1<"> 


(33) 


which  is  the  negative  of  the  correlation  between  f , (n)  and 
gm-i(n~l)-  This  is  the  equation  used  by  Itakura  [2].  For  r=-l, 
we  nave  the  solution  proposed  by  Burg  [3] : 


2 f 


m- 


l(n)  V!1"-11 

(n)  + 14-1,n) 


(34) 


Equation  (34)  may  also  be  obtained  by  minimizing  the  sum  of  the 
forward  and  backward  residual  energies.  Another  solution, 
proposed  by  the  author,  is  obtained  as  r— »-«>,  which  gives  the 
minimum  magnitude: 


K °°  = KM  = S min  [ |Kf  | , |Kg|  ] . (35) 

One  can  show  that 

|KM| <|KB| <|Kt | . (36) 

Therefore,  there  is  a host  of  suboptimal  solutions  to  choose 
from,  all  of  which  obey  (9).  Whichever  method  one  chooses,  the 
procedure  for  the  solution  is  as  follows:  Compute  Kj_,  then 
evaluate  f-^(n)  and  g^  (n)  in  the  range  of  interest;  now  compute 
Kj>,  then  evaluate  f,(n)  and  g2(n),  etc.  While  this  procedure  is 
straightforward,  it  can  be  expensive  computationally,  especially 
for  large  N.  We  now  present  an  alternate  implementation  that  is 
more  efficient. 

Efficient  Implementation 

We  first  compute  the  signal  covariances  R(k,i)  from  (12). 
The  remainder  of  the  solution  uses  R(k,i)  and  am(k)  for 
m=l,2,...,p.  From  (14)  and  (16),  one  can  write 
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fm(n)  = k!0  am(k)x(-k)  (37a) 

g_ ( n)  = z a (m-k) x(n-k) . (37b) 

m k=0  m 

One  can  easily  show  that  [5] 

m m 

fm_1 (n)gm_1 (n-1 ) = k£0  ,£0  am_1 ( k) am_1 ( i) R (k ,m-i)  (38a) 

m-  m 

_i(n)  = I .Z  a -|(k)a  1(i)R(k,i)  (38b) 

1 k=0  i=0 

= JQ  J0  am-1(k)ara-1(i)R(m-k,m-i).  (38c) 

These  values  can  be  used  with  any  of  the  solutions  for  Km. 
Therefore,  the  procedure  now  is  as  follows:  Compute  K^,  then 
A^z)  using  (8c)  or  (15b);  evaluate  the  terms  in  (38),  compute 
K 2 , then  A2<z),  etc.  The  computations  in  (38)  can  also  be  easily 
cut  by  about  one-half  [5].  The  procedure  given  above  results  in 
about  a four-fold  computational  savings  over  the  methods  of 
Itakura  and  Burg. 

Compar ison 

If  the  signal  is  windowed,  then  all  the  solutions  presented 
above  give  the  same  optimal  solution.  For  an  unwindowed  signal, 
the  various  definitions  of  Km  give  different  subopt imal 
solutions.  If  the  signal  is  a sample  of  a random  process  or  a 
periodic  signal,  and  N is  large,  the  various  definitions  give 

similar  results.  The  problem  arises  for  small  N,  or  for  certain 

deterministic  signals,  as  we  shall  see  in  the  examples  below. 

Example  _1  - Let  x(n)  be  the  impulse  response  of  a single-pole 
filter  l/(l-cz-*),  where  0<c<l,  i.e.  x(n)=cn,  n>0 , and  x(n)=0, 
n<0.  Assume  that  we  are  given  three  samples  (N=3) : x(0)=l, 

x(l)=c,  and  x(2)=c2.  We  wish  to  compute  the  single  parameter  of 
A^  ( z)  =1+3-^  (1 ) z-1 , i.e.  p=l.  From  (28)  and  (38),  we  have: 

Kf  = -R(0,1)/R(1,1)  ; = -R(0,1)/R(0,0)  . (39) 

Using  (12),  we  have  for  the  different  solutions  for  : 

Kf  = -c  ; = -1/c  (40a) 

K?  = 1 ; K?  ; KM=-c.  (40b) 

1 1 1 + c 

Note  that,  of  the  three  solutions  in  (40b),  only  KM  gives  the 
correct  result.  As  c— >1,  all  solutions  approach  1.  This  example 
shows  that  the  popular  use  of  KB  may  not  give  the  correct  result. 
One  may  conclude  then,  that  minimizing  the  sum  of  the  forward  and 
backward  residual  energies  may  not  always  be  advisable. 
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Example  2 - Let  x(n)  be  the  impulse  response  of  a 2-pole  filter 

l/(l-z  1+z-2) . For  this  case,  x(n)  is  a sine  wave  with  a 
frequency  of  1/6  Hz  (sampling  rate  = 1 Hz).  One  can  show  that 
x(n)  = sin  [ (n+1 ) ir/3  ] /sin  (tt/3  ) , for  n>0  . The  first  few  samples 
starting  with  n=0  are:  1 , 1 , 0 , -1  ,-l , 0 , . . . For  p=2,  A2(z)  is 
the  form: 


A2(z)  = 1 + K1(1+K2)z  1 + K2z-2.  (41) 

Therefore,  the  desired  values  of  the  predictor  coefficients  are 
a2(l)=-l  and  a2(2)=l,  from  which  we  find  that  K-^-l/2  and  K2=l. 
Assume  now  that  we  are  given  only  the  first  four  points  (N=4) , 
which  would  normally  be  sufficient  to  determine  the  correct 
coefficient  values.  For  N=4,  we  find  using  (12)  that  R(0,1)=0, 
and  therefore  all  the  definitions  of  will  be  equal  to  zero, 
instead  of  -1/2.  Using  (38)  and  (12) , one  can  show  that 

k|  = 1/2  ; = 1 (42a) 

K2  = 1//2  ; K2  = 2/3  ; K2  = 1/2.  (42b) 

Note  that  gives  the  only  correct  value,  with  K2  being  the 
closest  of  the  three  solutions  in  (42b).  However,  since  K^=0, 
the  whole  answer  is  not  even  close  to  being  correct,  because  even 
if  we  set  K2=l,  we  will  have  A2(z)=l  + z-2,  which  is  a sine  wave 
of'  frequency  1/4  Hz.  For  K2<1,  the  sine  wave  will  decay,  i.e. 
the  poles  will  have  a finite  bandwidth. 

This  example  has  demonstrated  that,  irrespective  of  which 
definition  of  Km  one  adopts,  the  stage-by-stage  minimization  may 
produce  drastically  erroneous  results.  The  author  believes  that 
it  is  this  problem  that  lies  behind  the  "line  shifting"  and  "line 
splitting"  phenomenon  observed  when  using  Burg's  method  [6]. 

6 . Unwindowed  Case  - Optimal  Solution 

A cure  for  the  problems  just  presented  is  to  perform  the 
optimal  minimization  of  the  forward  residual.  The  covariance 
method  presented  in  Section  2,  equations  (10)- (12),  give  the 
optimal  result.  If  the  resulting  all-pole  filter  is  stable,  then 
the  answer  will  be  identical  to  that  obtained  by  the  nonlinear 
minimization  in  the  lattice.  One  can  easily  test  the  stability 
of  such  a solution  by  using  the  reverse  recursion  (18)  and 
checking  to  see  if  all  obey  (9).  If  so,  then  there  is  no  need 
to  use  the  lattice.  If,  on  the  other  hand,  the  resulting  filter 
is  unstable,  then  we  need  to  use  the  lattice.  In  this  case,  we 
perform  a constrained  minimization  using  (9)  as  our  constraint. 
The  problem  is  greatly  simplified  by  transforming  the  constrained 
problem  into  an  unconstrained  problem.  The  idea  is  to  perform  an 
appropriate  transformation  on  Km,  Lm=T(Km),  and  minimize  with 
respect  to  1^,  instead  [6].  Assuming  we  wish  to  minimize  Fp  from 
(27a) , the  gradient  is  then  given  by 
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(43) 


3F 

dF 

dK 

E = 

e 

m 

3L 

dK 

dL 

m 

m 

m 

Two  transformations  are  of  particular  interest: 

_ , . -1„  dK  r,  72 

L = sin  K ; ^ / 1 ' K 


(44a) 


L"  = - log  = tanh  1K  ; ^ = 1-K2  (4  4B) 

Note  that  whatever  the  values  of  l'  and  L"  are,  K is 
automatically  constrained  to  obey  (9).  L'  is  a multi-valued 
function  of  K,  while  L"  is  a single-valued  function  of  K,  which 
might  be  preferable.  As  K— » 1 , both  transformations  will  probably 
lead  to  problems.  Fougere  [6]  has  circumvented  these  problems 
when  using  L'  by  setting  K=U  sin  L' , where  U is  a number  very 
close  but  less  than  1.  One,  therefore,  might  also  use 
K=U  tanh  L"  in  the  same  way.  To  the  author's  knowledge,  no  one 
has  used  L"  in  this  application  yet,  though  it  has  been  useful 
for  quantization  purposes  [7]. 


In  a gradient  search  algorithm,  the  updating  equation  is 
usually  of  the  form 

3F 

L (i+1)  = L (i)  - a L _ T , <45) 


m 


m 


m 8L_ 


m 


L = L ( i. ) 
m m 


where  i is  the  iteration  number.  Therefore,  we  need  to  compute 
(43) , and  hence  need 


3F 

3/  =2  V"> 

m 


afp<n) 

3Km 


(46) 


The  lattice  is  very  useful  in  evaluating  (46)  . One  can  show  that 
(46)  is  easily  implemented  as  shown  in  Fig.  2.  Again,  great 
savings  in  computation  can  be  obtained  by  making  use  of  the 
covariance  of  the  input,  as  we  did  in  (38) . 


Fougere  [6]  has  had  good  results  minimizing  the  sum  of  the 
forward  and  backward  residual  energies.  However,  his  problem  was 
that  of  a sine  wave,  where  the  poles  are  close  to  the  unit 
circle,  in  which  case  any  type  of  lattice  optimal  minimization 
would  do.  In  general,  our  experience  tells  us  that  minimizing 
only  the  forward  residual  energy  is  preferable.  This  also  cuts 
the  computational  cost. 


7 . Adaptive  Estimation 

For  applications  where  the  short-term  spectrum  changes  as  a 
function  of  time  (such  as  speech  and  line-tracking),  the  lattice 
offers  a simple,  fast-converging  adaptive  structure  that  has 
given  results  superior  to  the  traditional  adaptive  transversal 


168 


filter.  In  particular,  the  adaptive  lattice  seems  to  converge 
with  a speed  that  is  independent  of  the  eigenvalue  spread  of  the 
input  signal  [9] . This  can  be  shown  to  be  due  to  the 
orthogonality  property  (26)  of  the  backward  residuals  [4,8]. 

In  a time-varying  situation,  one  is  mainly  interested  in  the 
most  recent  history  of  the  signal.  Therefore,  it  is  reasonable 
to  weight  the  residuals  such  that  the  more  recent  values  are 
given  more  importance.  This  can  be  accomplished  by  using  a 
weighting  window,  to  be  distinguished  from  the  data  window 
mentioned  in  Section  2.  Below,  we  shall  use  the  definition  in 
(34)  for  Km;  similar  equations  can  be  written  using  other 
definitions . 

Given  Km(n) , l£m<p,  at  time  n,  and  the  forward  and  backward 
residuals  up  to  time  n,  the  reflection  coefficients  at  time  n+1 
are  computed  from  [8] 


K®  (n+D 
m 


2 i w (n-k)  f (k)g  . (k-1) 
k=-°° 

JL  w(n-k)[f2_l(k)  ^^(k-!)] 

Cm(n) 

Vn) 


(47a) 


(47b) 


where  w(n)  is  the  weighting  window.  The  value  of  Km  in  (47)  is 
always  guaranteed  to  obey  (9) . Equation  (47)  is  computed  for 
l£nKp,  then  the  new  residuals  at  time  n+1  are  computed,  etc. 

Weighting  Windows 

Data  windows  may  be  quite  arbitrary,  and  may  take  on 
positive  and  negative  values.  In  contrast,  weighting  windows 
must  always  be  nonnegative  [8].  In  particular,  we  must  have 

w(n)>0,  n>J3 

w(n)=0,  n<0.  (48) 


As  examples,  we  present  two  types  of  windows:  nonrecursive  and 
recursive.  The  nonrecursive  window  is  the  usual  rectangular 
window  of  width  M: 


w^n)  = 1,  0<n<M-l, 

= 0,  otherwise.  (49) 

This  window  has  some  bad  effects  as  a data  window  but  has  good 
properties  as  a weighting  window.  The  recursive  window  is  the 
impulse  response  of  a single  real  pole: 
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(50) 


w2(n)  = 6n,  n>_0,  0<S<1 
= 0,  n<0\ 


For  this 
compute 
(47)  an 


window , 


as  well  as 
Dm  in  (47b) 


other  recursive  windows,  one  can 
recursively.  For  w2(n)  we  have  from 


cm<k)  = BCm(k-l)  + 2fm_1(k)gm_1(k-l)  (51a) 

Dm(k)  = 6Dm(k-l)  + fj_l(k)  +gj_1(k-l).  (51b) 

Other  all-pole  windows  may  be  defined,  but  because  of  condition 
(48)  , all  such  weighting  windows  must  be  the  impu.'  se  responses  of 
filters  with  positive  real  poles. 


For  the  special  window  w2(n),  one  can  show  that  (47a)  can  be 
written  as: 


K>+1)  “K>> 


i(n)9m(n)+g_  t (n-1) f (n) 

m-l  m m-1  m 

D (n) 
m 


(52) 


This  is  similar  to  the  steepest  gradient  algorithm  of  Griffiths 
[9],  Because  of  the  equality  of  (52)  to  (47a),  Km  in  (52)  is 
guaranteed  to  obey  (9)  always.  We  point  out  that,  with  recursive 
windows,  one  might  be  able  to  use  Kf  instead  of  KB  without  fear 
that  (9)  would  be  violated. 


Spectral  Estimation  Example 


The  data  x(n)  was  generated  by  passing  white  Gaussian  noise 
through  an  11-pole  filter  whose  transfer  function  is  shown  in 
Fig.  3.  The  adaptive  estimation  procedure  shown  in  (47b)  and 
(51)  with  B=1  was  used  in  order  to  test  the  convergence  speed  of 
the  lattice.  The  goodness  of  fit  after  each  iteration  j was 
measured  using  the  likelihood  test  of  Itakura  [10]: 


P 

- . 

k=0  i=0 


a^(k) 


a^(i)  R(i-k) 


(53) 


I(j>  = \ ! 

E 

* . P 

where  is  the  minimum  possible  residual  energy  as  computed  from 
(25),  R(i)  is  the  autocorrelation  corresponding  to  the  power 
spectrum  in  Fig.  3,  and  ai(k)  are  the  predictor  coefficients 
computed  from  the  reflection  coefficients  at  iteration  j.  i(j) 
is  always  greater  or  equal  to  1;  it  is  equal  to  l iff  the 

identical  to  the  desired  filter.  Fig.  4a 
as  a function  of  j for  a single  record, 
average  of  estimates  corresponding  to  15 
Note  the  speed  of  convergence;  it  is 
generally  proportional  to  p,  but  is  not  affected  by  the  spectral 
dynamic  range  of  x(n)  (i.e.,  is  not  affected  by  the  eignevalue 
spread,  which  is  aobut  1000  in  this  example. 


computed  filter  is 
shows  a plot  of  I ( j ) 
Fig.  4b  shows  the 
different  records. 
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8.  Conclusion 


The  purpose  of  this  paper  was  to  show  that  the  lattice  is  a 
very  useful  tool  in  spectral  estimation,  especially  for  short 
data  lengths.  Thte  orthogonality  property  of  the  backward 
residuals  has  given  the  lattice  fast  convergence  properties  that 
are  especially  useful  also  in  adaptive  Wiener  filtering,  such  as 
in  adaptive  equalizers  [4,8],  and  adaptive  noise  cancelling  [9]. 
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LIKELIHOOD  RATIO 


FIGURE  3.  11-pole  spectrum  for  example  2 in  text.  (The  spec- 
trum corresponds  to  that  on  an  [s]  sound.) 


FIGURE  4.  Performance  of  adaptive  lattice  as  a function  of 
sample  number.  (a)  Plot  for  a single  record. 

(b)  Average  over  15  records. 
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ABSTRACT 

The  problems  of  spectrum  estimation  and  harmonic  analysis  are  examined  using  tech- 
niques suggested  by  properties  of  the  complex  Wishart  distribution.  From  this  viewpoint  the 
two  problems  are  distinct  and,  in  the  spectrum  estimation  case,  lead  to  more  stable  estimates 
than  do  conventional  nonparametric  estimates.  This  stability  is  obtained  by  replacing  the  con- 
ventional estimate  resulting  from  smoothing  over  frequency  with  a combination  of  several 
direct  estimates  computed  using  orthogonal  data  windows. 

New  results  are  also  obtained  for  the  harmonic  analysis  problem  which  is  solved  by  means 
of  a Karhunen-Loeve  expansion  in  the  frequency  domain.  One  result  of  this  approach  is  an 
analysis  of  variance  test  for  line  components  in  a spectrum. 

1.  Introduction 

Two  recurring  problems  in  time  series  analysis  are  those  of  obtaining  stable  high  resolu- 
tion estimates  of  the  spectrum  when  the  dynamic  range  is  large  and  the  problem  of  “mixed” 
spectra.  In  the  first  case  there  is  a fundamental  conflict  between  bias  and  resolution  which  is 
frequently  confused  with  a secondary  trade  between  resolution  and  variance.  The  second  prob- 
lem, that  of  the  “mixed”  spectrum  refers  to  the  important  practical  case  where  a number  of 
possibly  periodic  components  are  present  together  with  non-deterministic  noise. 

While  both  of  these  problems  have  been  extensively  studied  there  are  still  very  few  tech- 
niques available  which  work  well  in  a wide  variety  of  situations  and  in  recent  years  numerous 
“new”  techniques  and  modifications  of  existing  procedures  have  been  introduced.  Conse- 
quently one  is  faced  with  choosing  one  of  possibly  several  hundred  different  spectrum  esti- 
mates. This  choice  is  made  more  difficult  by  misapplication  of  these  techniques,  for  example 
the  use  of  the  “Burg  algorithm”  to  search  for  periodicities,  and  by  the  lack  of  rigorous  statisti- 
cal procedures  in  evaluating  alternative  methods.  Thus  the  classical  problems  of  spectrum  esti- 
mation, resolution,  stability,  hypothesis  testing,  and  confidence  statements  are  now  more  neces- 
sary than  ever. 

Because  these  problems  have  proved  intractible  using  conventional  spectrum  estimation 
procedures  we  venture  yet  another  method,  in  this  case  one  suggested  by  properties  of  the 
Wishart  distribution.  By  its  definition  (Anderson  11958])  the  Wishart  distribution  is  that  of  the 
estimated  covariance  matrix  of  a multivariate  normal  and  so  forms  a generalization  of  the  Chi- 
square  distribution.  However,  rather  than  attempting  to  make  inferences  on  the  structure  of 
the  process  by  applying  the  properties  of  this  distribution  to  the  observed  autocovariances  of 
the  process,  the  approach  used  here  is  to  exploit  these  properties  in  the  frequency  domain.  This 
procedure  results  in  an  estimation  procedure  which  differs  significantly  from  conventional  spec- 
trum estimates,  for  example  a stable  non-parametric  estimate  is  obtained  which  does  not 
employ  conventional  frequency  averaging. 

This  viewpoint  may  be  readily  understood  by  regarding  the  spectrum  estimation  procedure 
in  terms  of  matrix  operations.  We  define  X f to  be  a complex  row  vector* 

* Matrices  and  vectors  are  indicated  by  bold  face  type,  X,  except  that  E indicates  the  expected  value  operator. 

Fourier  transforms  are  indicated  by  a "tilde”,  X,  and  Herm4tians  (conjugate  transpose)  by  a superscript  t as 
X f.  The  components  of  vectors  or  matrices  are  indicated  by  the  corresponding  italic  letter,  usually  subscript- 
ed, x *,  and  complex  conjugates  by  a superscript  * 


X'-  (id  . *1*  . *2  > • • • • x',,/2  ) (11) 

consisting  of  the  Fourier  coefFicients  of  the  windowed  data  then  the  usual  estimate  of  the  spec- 
trum is  given  by 

S -diag<X\*>  (12) 

where  again  S is  a vector  containing  the  spectrum  estimates  at  the  usual  equidistant  frequency 
spacings  and  diag  < > indicates  the  operation  of  extracting  the  diagonal  components  of  the 
matrix.  One  of  the  major  disadvantages  of  such  estimates  is  that  except  for  the  uni-  and  bi- 
variate cases  their  multivariate  distributions  are  exceedingly  complex  so  that  it  is  difficult  to 
make  reliable  inferences  from  such  estimates. 

If  however  we  consider  the  matrix 

S = XXf  (1.3) 

rather  than  it’s  principle  diagonal  it  is  clear  that  much  more  information  is  available.  In  partic- 
ular, square  submatrices  taken  along  the  diagonal  of  S have  a Wishart  distribution.  Further- 
more, in  regions  where  the  spectrum  is  reasonably  flat,  the  theoretical  correlation  matrix  of  the 
distribution  is  known  and  depends  primarily  on  the  data  window.  For  these  and  other  reasons 
which  will  be  described  in  the  following  it  is  useful  to  study  the  problems  of  spectrum  estima- 
tion and  harmonic  analysis  in  terms  suggested  by  the  Wishart  distribution. 

Following  the  introduction  of  some  notation  we  consider  the  problem  of  estimating  the 
continuous  spectrum.  This  estimate  suggests  an  adaptive  form,  section  4,  computed  using 
orthogonal  data  windows.  The  final  section  presents  an  approach  to  the  harmonic  analysis  prob- 
lem and  a test  for  line  components. 


2.  Basic  Definitions  and  Notations 

We  consider  data,  /(/),  which  is  a realization,  possibly  complex,  of  length  T from  a sta- 
tionary stochastic  process  having  finite  fourth  moments.  Such  data  has  a spectral  representation  I 
see  Doob  (1953)  chapters  X,XI  1 

/(/)  “ f e'°"dy(w)  (2.1) 

2 it  J 


The  data  is  windowed  by  a function  D(t)  and  we  form  an  estimate  of  the  orthogonal  incre- 
ment process,  dy(  a>),  in  the  continuous  case  by  the  integral 

T/r 

xu  = J <?■  '“'  /(r)  DU)  dt  (2.2) 

-772 

In  this  paper  we  will  assume,  for  reasons  given  in  Fisher  (1929)  that  this  Fourier  transfom,  xu 
has  an  approximately  Gaussian  distribution.  This  integral  can  be  approximated  crudely  using  a 
trapezoidal  rule  ( in  which  case  it  reduces  to  the  same  formula  as  the  discrete  case  below  ) or 
more  accurately  through  the  use  of  splines  [ see  Aronson(1969)l.  For  the  discrete  time  case 
the  estimate  becomes: 


K “ L e 

i-0 


T- 1 -mil  - 


r-i 


DU)J(t) 


(2.3) 


Usually  the  calculations  are  done  on  the  discrete  mesh  of  points  co*  — 2n—  where 

m 

A —0,  1,  • ■ • , /m/2  when  the  data  is  real  and  k — —/m/2,  • • • , 0,  • • • , mi/2  when  the  data  is 
complex.  In  the  following  we  will  assume  these  conditions  and  for  simplicity  the  notation 
is  shortened  to  xk.  It  is  also  assumed  that  nt  > 2T  always  and  that  when  high  resolution  is 
required  mi  will  be  much  greater  than  IT.  This  is  simply  the  frequency  domain  equivalent  of 
oversampling. 

In  these  equations  the  data  window,  D,  is  properly  normalized  so  that  the  common  direct 
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estimate  of  spectrum 

S(w)  = |xj2  (2.4) 

is  unbiased  for  white  noise.  In  general  the  expected  value  of  this  spectrum  estimate  is  given  by 
the  convolution 

E{  S ( cu  ) ) = S(u>)  *|0|2  (2.5) 

which  explicitly  shows  the  dependence  of  the  estimate  on  both  the  true  spectral  density  func- 
tion of  the  process  and  also  on  the  data  window.  Further  information  on  such  estimates  is 
available  in  Thomson  (1977)  and  the  references  cited  therein. 

For  reasons  given  in  the  above  reference  and  elsewhere  ( eg  Harris  (1978)  1 we  will 
assume  that  the  data  window,  D is  a spheroidal  wave  function,  usually  a prolate  spheroidal  wave 
function.  While  the  properties  of  these  functions  are  discussed  exhaustively  in  Slepian,  Landau, 
and  Poliak  (1961a,b,  1962  ) and  Rhodes  (1970)  their  most  important  property  for  the  applica- 
tions given  here  is:  Of  all  unit  energy  time-limited  functions  they  are  the  most  concentrated,  in 
the  L 2 sense,  in  frequency. 

3.  Estimation  of  the  Continuous  Spectrum 

In  this  section  we  will  be  concerned  primarily  with  the  problem  of  forming  a stable  esti- 
mate of  the  spectrum  in  situations  where  harmonic  lines  do  not  exist.  This  of  course  is  simply 
the  classical  “smoothing”  problem  but  when  the  problem  is  approached  from  the  Wishart 
matrix  viewpoint  the  resulting  “smoothers”  take  on  the  form  of  autoregressive  decompositions 
applied  to  the  Fourier  transforms.  To  see_  this  consider  a square  p x p square  complex  submatrix, 
S;,  taken  along  the  main  diagonal  of  S about  some  frequency  c ok.  For  example  when  p — 3 a 
typical  submatrix  is 

X*-lX*-l  XaX*-1  X*  + |X*-I 

x<-,x/  x,x;  x,  + lx,  (3.1) 

Xn-|X*+|  XkXk  + \ X*+iX*  + i 

Clearly  the  submatrix  Sk  is  a sample  covariance  matrix  of  the  form  X^  Xk  where 

X/  = lx* -i  . X/*,  x*+|  1 (3.2) 

By  definition,  see  Anderson  (1958)  chapter  7,  Sk  has  a Wishart  distribution. 

We  now  make  the  usual  assumption  in  smoothing,  that  is  that  the  spectral  density  is 
approximately  constant  over  the  width  of  the  smoother,  and  with  this  assumption  it  can  be  seen 
that  £0*  the  the  population  correlation  matrix,  is  Toeplitz,  and  completely  specified.  Further, 
the  population  covariance  matrix,  £,  is  simply  a scalar  multiple  of  Lo  with  the  multiplier  being 
the  unknown  spectral  density,  sk. 

As  a side  point  we  note  that  if,  instead  of  assuming  that  the  spectrum  is  locally  constant, 
we  assume  that  it  can  be  represented  locally  by  a Taylor’s  series  one  obtains  a population 
covariance  matrix  which  can  be  represented  by  the  sum 

£ = h.oZo  + \ |£t  + ’ ‘ ' (3.3) 

where  L0,  I,,  . are  known  Toeplitz  matrices.  In  this  case  sk0  is  an  estimate  of  S(tnk),  sk ,i  of 
S'(<uA),  etc.,  and  may  be  estimated  by  an  obvious  extension  of  the  technique  described  below. 
This  technique  has  significant  implications  from  a smoothing  viewpoint  for,  if  one  expands  5 in 
a Fourier  series,  the  coefficients  may  be  estimated  by  simultaneously  minimizing  the  mean 
squared  error  on  the  fits  to  both  S and  S'.  Information  on  such  fitting  procedures  is  available  in 
Gagaev  (1957). 
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3.1.  The  Correlation  Matrix  of  the  Fourier  Transforms 

The  components  of  the  population  covariance  matrix,  cr^,  may  be  written,  in  the  continu- 
ous time  case  as 

<r,k  = E ( Xjxl ) (3.4) 

77. 2 

= J D(x)  D(u)  e '<“'A  E { y(jc)  /*(«))  dx  du 
-772 

When  the  data  window,  D,  is  such  that  the  corresponding  spectral  window,  |D(oj)  |2  is  local- 
ized, as  are  the  windows  based  on  prolate  spheroidal  wave  functions,  this  integral  can  be  simplified 
and  becomes 

772 

cr(£)-  f D2(t)e*'dt  (3.5) 

-77  2 

where  <r  ( £ ) oc  o- (/i  and  £ = a> , —<ok.  For  simplicity  of  notation  we  shall  indicate  o-(nAcu)  by  cr  „ 
with  the  normalization  o-0=l.  Note  that,  because  of  the  time  normalization  factor  in  2.2  the 
real  and  imaginary  parts  of  x,  are  uncorrelated  so  that  <r  describes  the  autocorrelation  of  either. 


3.2.  Spectrum  Estimates  from  the  Wishart  Distribution 

Using  the  above  notations  the  Wishart  probability  density  function  is  given  by 
IS*  expf-'/WrU-'Sj) 

2'hnPlrV.p{p-\)  | L r(l/2(„  + l -/)) 

/-I 


in  which  n represents  the  number  of  terms  added  together  to  form  each  element  of  the  sample 
covariance  matrix  and,  as  above,  the  covariance  matrix  is  p*p.  Thus  in  the  single  sample  case 
n = 2 because  of  the  real  and  imaginary  components.  Examining  the  log-likelihood  function 
one  finds 


L = —Vi tr[l.  1 St  j — 'h //In | L | + terms  independent  of  E (3.7) 

Now  /r(£_lS,,  J is  given  by  — //•{  L0_IS* ) and  1 1 1 is  s[  \ E0  I s0  'fiat  subject  to  the  above  con- 

St, 

ditions  the  maximum-likelihood  estimate  of  sk  is  given  by 

s*  - — tr{  T.0~'Sk  ) (3.8) 

np 

Under  the  smoothness  and  localization  assumptions  given  above  E0  is  Toeplitz  so  that  it's 
inverse  may  be  found  simply  and  the  spectrum  estimate  given  in  3.8  written  as  a weighted  sum 
of  squares.  This  is  most  easily  seen  by  writing  E0  in  the  form,  Burg(1972) 

l<fl  = Ar-1Ar  (3.9) 


where  A is  a lower  triangular  martix  consisting  of  the  prediction  error  filters  computed  from  the 
correlations,  o-(A  Aoi),  of  the  estimated  orthogonal  increment  process.  Similarily  T is  a diago- 
nal matrix  of  the  corresponding  frequency  domain  prediction  errors, 
T—  diag(\,  y\,  , • • • , yr-\).  Thus  the  quantity  wlEcf'S*  ) may  be  written  as 

r/-(Ar-|A%X/)  (3.10) 

which,  by  the  properties  of  the  trace  operator  becomes 

tr  ( X/A  r-1  ArX* ) (3.11) 


Since  T is  diagonal  this  can  be  written  as  a sum  of  squares  so  that 

. IV  1 * (,)  |J 

2 01/1  -*A +I/I/2I -/  + /) 

np  ,_0  y;  „_o 


(3.12) 
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This  form  is  interesting  in  its  implications  as  it  basically  consists  of  a sum  of  direct  spec- 
trum estimates  with  different  data  windows.  The  first  of  these  windows  is  the  one  selected, 
t.e.  D , the  second  is  almost  it’s  first  derivative,  and  so  on.  Also  the  correlations  between  the 
estimates  in  the  above  sum  are,  in  the  case  of  locally  fiat  spectra,  low  and  may  be  described  in 
terms  of  the  partial  correlations  of  the  data  window. 

Clearly  this  estimate  is  positive.  Also,  in  common  with  direct  and  indirect  estimates  and 
in  contradistinction  to  most  parametric  estimates,  the  estimate  corresponding  to  several 
independent  data  sets  is  simply  the  average  of  the  individual  estimates. 

4.  Direct  Estimates  Based  on  Orthogonal  Data  Windows 

However,  since  the  title  of  this  paper  is  “Estimates  Motivated  by  the  Wishart  Distribu- 
tion”, we  make  the  obvious  generalization  and  replace  the  approximately  orthogonal  data  win- 
dows with  a strictly  orthonormal  set  of  data  windows.  Specifically  we  choose  to  use  the  first  p 
prolate  spheroidal  wave  functions  or  their  generalizations  and  form  the  p estimates  of  spectra 

Til 

Si'1-  J e "“*'/(  r)  D,(t)  dt  y = 0,  1,  ■ • • , p — \ (4.1) 

-Til 

where  the  data  windows  D,(t)  are  proportional  to  the  j'h  prolate  spheroidal  wave  function 
S0l(c , 2t/T)  and  are  normalized  by  the  condition 

D,(t)2dt  = 1 (4.2) 

-Til 

In  these  windows  the  parameter  c = ft  T/2  represents  the  time  bandwidth  product  and  the  win- 
dows have  the  property  that  the  integral  of  the  spectral  window  over  (—ft,  ft)  is  given  by 
n 

•i-  f |D,U)  \2d<o  = 1 -A,(c)  (4.3) 

2 77 

where  A,(r)  is  the  j'h  eigenvalue  of  the  integral  equation  defining  the  functions  [ Slepian  & 
Sonnenblick  (1965)  ]. 

Now,  by  using  arguments  given  in  detail  in  Thomson  (1977a)  the  broad  band  bias  of  the 
j"'  estimate  is  bounded  by  a term  proportional  to  1 — A,(c).  These  eigenvalues  have  the  pro- 
perty that 

1 > A„(c)  > X,(c)  > • • • ^ 0 (4.4) 

so  that  the  bias  properties  of  the  estimates  have  a natural  rank  ordering.  Obviously  the  esti- 
mate having  the  minimum  bias  is  that  obtained  using  the  O'*  function,  and  choosing  c = 4n  as 
an  example  1 — Xo~3-10_1°.  This  window-),  discussed  at  length  in  Thomson  et  at  (1976)  has 
superb  bias  properties.  A second  important  property  of  these  eigenfunctions  is  that  there  are 
approximately  2 c / n “large”  eigenvalues  after  which  they  rapidly  decay.  This  number  sets  an 
upper  bound  on  the  parameter  p,  and  for  the  477  window  the  use  of  the  first  4 or  5 functions  is 
reasonable  so  that  in  these  terms  the  estimate  3.12  above  is  simply 

Sk-~  L (4.5) 

P ,-o 

In  regions  where  the  spectrum  may  be  regarded  as  “flat”  over  the  frequency  range 
(a>k—2c/T,  <vk+2e/T)  this  estimate  is  distributed  as  a \lr.  As  an  example  for  c-4tt  and 
p = 5 this  technique  results  in  10  degrees  of  freedom,  better  than  the  6.7  dofs  obtainable  by  fre- 
quency averaging  the  0'''  estimate  over  the  same  bandwidth  and  is  reasonably  efficient  relative 
to  the  16  dofs  optimistically  available. 


t Named  by  ihe  authors  colleagues  the  "Thomson"  window. 
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To  see  the  behaviour  of  these  estimates  better  figure  1 shows  the  first  four  prolate 
spheroidal  wave  functions  in  the  time  domain  ( and  since  they  are  their  own  Fourier  transforms 
also  in  the  frequency  domain  ).  Figure  2 shows  the  corresponding  spectra)  windows  as  a func- 
tion of  frequency  normalized  to  units  of  1 / T.  Figure  3 shows  the  spectral  window  obtained  by 
averaging  the  estimates  according  to  4.5  as  a function  of  p.  and  it  can  be  seen  that  the  spectral 
window  approaches  an  “ideal”  rectangular  form.  However,  since  the  eigenvalues,  decrease, 
the  bias  of  this  simple  estimate  is  increased. 

The  fact  that  these  eigenvalues  have  a natural  ordering  suggests  a further  generalization  in 
the  form  of  an  adaptive  estimate.  If  one  now  computes  the  regression  coefficient  between  the 
logarithms  of  the  different  spectrum  estimates  and  Inf  1 — Kk  ), 


Ck  = Z A,  ln(s* 


S.OM  - 


lnU<’i 


where  ln(  i*  ) indicates  the  average  over  the  logarithms  of  the  p spectrum  estimates,  and 

ln(l  -Ay)  - - "t!  In(  1 “M 

P h-0  . . ,, 

A j — — pr  (4-7> 

£ ( Inf  1 - A,  - — £ Inf  1 -A/,)  l2 

1-0  P h- 0 

one  has  a powerful  test  for  violation  of  the  “localized  estimate"  hypothesis.  Under  this 
hypothesis,  that  is  that  the  differences  between  the  p spectrum  estimates  reflect  only  sampling 
variation  from  within  the  “local”  frequency  region  of  ±2c/T  and  that  “broad  band”  bias  is 
insignificant,  one  has  E ( Ck ) = 0 and  var  { <TA  } = 7r2/6A  where  A is  the  term  in  the  denominator 
of  4.7.  Under  the  alternative  hypothesis,  that  is  that  the  estimates  are  being  influenced  by 
“broad  band"  bias  more  than  by  local  conditions,  the  dependence  of  this  bias  on  the  eigen- 
values will  be  seen  as  large  positive  values  of  the  regression  coefficient  Ck.  In  such  cases  the 
estimate  can  be  made  adaptive  by  using  the  following  procedure  which  tends  to  ignore  the  more 
biased  estimates. 

Following  computation  of  the  regression  coefficient,  Ck,  the  adaptive  procedure  is  ini- 
tiated by  setting  the  unnormalized  spectrum  estimate 

£>*<'»  = s/0'  + s*(n  (4.8) 

and  a cumulative  weight 


after  which  steps  4.10-4.13  are  done  for  y'  = 2,  ■ ■ • , p—  1.  The  first  step  of  this  iterative  pro- 
cess is  to  compute  an  estimate  of  the  broad  band  bias  for  the  j"'  estimate 


bk  = exp  | ln{s/  } + C*  In  ( 1 — ) j 

which  is  used  with  the  current  estimate  of  the  spectrum  to  compute  a weight 


’/  WA(-'~n 


next,  the  unnormalized  sum  and  cumulative  weight  are  updated 

UkU)  - 1111  ~11  + >vk',sk'> 


fV/P  - W^~u  + wk(l) 
and  the  final  estimate  given  by 

Sk  - 6/A(/7 ~n / W^-u 


As  an  example  of  a situation  where  such  a procedure  is  useful  it  has  been  hypothesized 
[Gans  (1972)]  that  the  temporal  fading  characteristics  of  a particular  mobile  communications 
channel  may  be  approximately  described  in  terms  of  a stationary  process  with  a Bessel  auto- 
correlation function.  This  autocorrelation  function  corresponds  to  a band-limited  spectrum 
where  the  bandlimit  is  given  by  the  mobile’s  doppler  frequency.  To  check  the  accuracy  of  such 
a model  it  is  necessary  to  have  confidence  in  the  processing  algorithm  and  data  corresponding 
to  this  model  was  generated  using  a Karhunen-Loeve  expansion.  To  speed  up  the  simulations  a 
frequency  translated  replication  of  the  process  was  added  to  the  hypothesized  low  pass  process 
so  that  the  behaviour  across  three  bandedges  is  observable. 

The  upper  frame  of  figure  4 shows  an  example  of  127  such  data  points  ( 1 second  ) in  the 
time  domain.  The  lower  frame  shows  the  spectrum  estimates  obtained  using  the  first  five  pro- 
late spheroidal  wave  functions  with  c = 47t  as  data  windows  and  it  may  be  seen  that,  as 
expected,  those  corresponding  to  the  functions  of  order  3 and  4 are  badly  biased  near  the  band 
edges.  The  adaptive  estimate  is  demonstrated  in  figure  5 with  the  upper  frame  showing  the 
regression  coefficient  standardized  by  its  theoretical  variance.  Clearly  it  is  very  effective  as  a 
bias  indicator  and,  as  can  be  seen  in  the  lower  frame  the  adaptive  spectrum  estimate  has  the 
stability  of  a smoothed  estimate  in  regions  where  bias  on  the  higher  order  functions  is  low  and 
the  low  bias  properties  of  the  zero  order  function  elsewhere. 

S.  Harmonic  Analysis  and  a Test  or  Significance 

As  mentioned  in  the  introduction  one  of  the  outstanding  problems  in  spectrum  estimation 
is  that  of  “mixed”  spectra,  that  is  the  case  where  the  data  contains  a number  of  periodic  signals 
in  addition  to  the  usual  purely  nondeterministic  component  [ Priestley  (1962)  ].  The  first  prob- 
lem in  this  situation  is  to  decide  if,  in  fact  a given  peak  jn  an  estimated  spectrum  is  a result  of  a 
line  component  or  if  it  is  simply  extreme  sampling  variation.  For  this  purpose  the  usual 
approach  is  to  compare  the  level  of  the  peak  with  that  of  the  local  continuous  component  and 
apply  an  ftest  with  2 and  “several”  degrees  of  freedom. 

In  principle  this  situation  can  be  handled  by  expanding  the  process  in  the  time  domain 
using  a Karhunen-Loeve  expansion  with  a non-zero  mean  value  function  consisting  of  a tri- 
gonometric series  and  solving  the  resulting  likelihood  equations  for  the  coefficients  of  the 
series.  This  approach  however  presupposes  a knowledge  of  both  the  number  and  frequencies  of 
any  existing  lines,  the  covariance  function  of  the  non-deterministic  component,  and  also  is 
impractical  from  a computational  viewpoint  for  all  except  very  short  series. 

As  before  we  make  the  assumption  that  the  portion  of  the  spectrum  due  to  non- 
deterministic components  of  the  spectrum  is  slowly  varying  so  that  the  matrix  S*  is  approxi- 
mately Wishart  with  a Toeplitz  covariance  matrix.  The  presence  of  the  line  component  how- 
ever implies  that  S*  is  non-central  Wishart  so  that  [ see  Muirhead  (1978)]  direct  application  of 
likelihood  procedures  becomes  exceedingly  complex.  This  complexity  may  be  circumvented  by 
using  an  eigenvalue  decomposition  for  so  that  the  test  which  we  propose  uses  the  correla- 
tion properties  of  the  windowed  and  transformed  data  and,  in  fact,  is  implemented  using  a 
discrete  Karhunen-Loeve  expansion  in  the  frequency  domain.  This  approach  avoids  the  out- 
standing problems  with  the  time  domain  method:  the  covariance  function  of  the  estimated 
orthogonal  increment  process  is  known,  the  shape  of  the  mean  value  function  is  known,  and 
finally  the  use  of  good  windows  restricts  the  frequency  range  which  must  be  examined. 

In  this  application  use  of  the  Karhunen-Loeve  expansion  is  dictated  by  numerical  con- 
siderations. The  transformed  data  is  a band  limited  function  so  that  direct  matrix  inversion  ( for 
example  as  required  in  the  formally  equivalent  generalized  least  squares  technique  ) is  unstable. 
The  use  of  the  truncated  Karhunen-Loeve  expansion  is  sufficient  for  the  estimation  procedure 
while  retaining  numerical  accuracy. 

As  above  we  consider  a purely  non-deterministic  time  series  but  now  with  the  addition  of 
a line  component  at  a frequency  io0.  In  this  case  the  x(<o)  has  a mean  value  function 
pD where  p is  proportional  to  the  amplitude  of  the  periodic  component.  Again  on 
the  assumption  that  the  continuous  component  of  the  spectrum  is  slowly  varying  in  the  vicinity 
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of  <0,1  on  can  write  the  likelihood  in  terms  of  a Karhunen-Loeve  expansion  about  some  center 
frequency  <oA  as 

Z.U  ) = (X*  -mD)'¥A-'¥'(X*  -/uD)  (5.1) 

In  this  equation  the  eigenvalues  and  eigenfunctions  are  defined  by  the  matrix  equation 

AV=lV  (5.2) 

with  the  eigenfunctions  normalized  by  1 Note  however,  that  because  of  the  numerical 

problems  referred  to  above,  that  only  the  eigenvectors  corresponding  to  the  first  h largest  eigen- 
values are  retained.  Thus  ¥ * is  a h*p  matrix  while  A is  an  hxh  diagonal  matrix  of  these 
eigenvalues.  The  vector  describing  the  mean  value  function,  D is  defined  as 

Df  = (D(  --^-^-Ao),  D(  ~£^-A«),  Aco))  (5.3) 

Maximizing  the  likelihood  the  estimate  ,/2„,  of  a possible  line  component  amplitude  is  given  by 
the  scalar  equation 

D^A -g1DVA'lt'D  (5.4) 

Under  either  hypothesis  estimating  a possible  mean  value  function  accounts  for  2 degrees  of 
freedom  so  that 


= (X*  -A<D)^A'1’P,(Xa  -m<D)  (5.5) 

is  distributed  proportionally  to  a \2h  -2-  Similarily,  under  the  null  hypothesis, 

<^,2=  ImJ2D>A-'V'D  (5.6) 

will  be  proportional  to  a x2-  Since  these  two  variance  estimates  are  independent  their  ratio. 


(5.7) 


will  have,  under  the  null  hypothesis,  an  F distribution  with  2 and  2/;— 2 degrees  of  freedom 
which  may  be  tested  for  significance  by  standard  methods. 

In  practice  the  test  for  line  components  has  been  modified  so  that  a search  procedure'  is 
used.  The  first  stage  of  this  procedure  is  to  tag  all  local  maxima  of  the  estimated  spectrum.  In 
the  second  stage  the  F statistic  is  computed  over  a mesh  of  points  about  these  local  maxima  and 
that  frequency  which  has  the  maximum  F ratio  identified  as  “the  frequency”  of  a possible  line 
component.  Also  the  summation  is  modified  so  that  only  even  eigenfunctions  are  included;  this 
has  the  effect  of  increasing  the  sensitivity  of  the  test  and  also  of  making  it  insensitive  to  slope 
in  the  spectrum.  The  final  step  in  the  procedure  is,  when  a significant  line  component  is 
detected,  to  first  remove  the  estimated  mean  value  function  from  the  Fourier  transform,  and 
second  to  modify  the  resulting  spectrum  by  replacing  the  power  thus  removed  at  the  estimated 
line  frequency. 

As  a test  of  this  theory  data  similar  to  that  used  in  the  preceding  example  with  the  addi- 
tion of  an  AR  —5  base  component  and  three  low  level  non-harmonic  sinusoids  was  generated. 
Figure  6 shows  a typical  time  domain  realization  ( 127  data  points  ) and  the  unsmoothed  direct 
estimate  computed  using  the  4ir  prolate  window.  Also  plotted  is  the  “marker”  sequence  show- 
ing the  peaks  to  be  examined.  The  results  of  this  examination  are  shown  in  figure  7.  The 
upper  grid  gives  values  of  Fat  the  selected  peaks  while  the  lower  grid  shows  the  spectrum  esti- 
mate with  the  “redistribution”  of  power.  Limited  experience  with  this  procedure  has  shewn 
that  it  is  accurate  both  with  respect  to  correctly  classifying  peaks  as  lines  or  sampling  variation 
and  also  for  correctly  estimating  the  frequencies  of  peaks  with  typical  accuracies  being  better 
than  1/4  77 
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6.  Summary  and  Conclusions 

Using  frequency  domain  characteristics  of  tapered  and  transformed  samples  of  a stationary 
time  series  suggested  by  the  Wishart  distribution  two  new  techniques  for  time  series  analysis 
have  been  described.  The  first  of  these  is  well  adapted  to  estimating  the  continuous  component 
of  a spectral  density  function  and  combines  high  resolution  with  stability.  The  second  tech- 
nique gives  a simple  and  accurate  test  for  line  components  in  a spectrum.  Both  techniques  per- 
form well  on  simulation  data. 
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Introduction 


This  paper  concerns  itself  with  the  class  of  radar  surveillance  applica- 
tions generally  referred  to  as  the  "look-down"  radar  problem.  A typical  ex- 
ample is  detection  of  low  flying  air  vehicles  using  airborne  or  spaceborne 
radars.  This  paper  addresses  the  use  of  advanced  spectral  estimation  tech- 
niques (as  compared  with  the  conventional  Fourier  transform)  to  distinguish 
between  signals  from  targets  and  those  from  background  clutter. 

This  paper  should  be  viewed  as  a status  report  in  as  much  as  the  inves- 
tigation was  only  recently  initiated.  Of  what  follows,  approximately  one 
third  is  devoted  to  an  exposition  of  the  problem,  one  third  to  the  spectral 
estimation  techniques  considered  and  one  third  to  preliminary  results.  These 
results  are  promising  and  it  is  hoped  that  this  report  will  interest  others 
in  the  problem  and  motivate  further  research. 

The  Look-Down  Radar  Problem 


Horizon  limits  and  local  terrain  features  can  severely  restrict  a land- 
based  radar's  capability  to  detect  low  flying  targets.  This  limitation  has 
generated  much  interest  in  airborne  or  spaceborne  radar  platforms  which  have 
a much  greater  field-of-view.  There  are  difficulties  which  must  be  overcome. 
Background  clutter  is  a major  one.  In  most  cases  the  earth  intersects  the 
radar  beam  pattern  and  much  of  the  return  from  the  earth  (ground  clutter)  can 
not  be  separated  from  the  target  using  the  radar's  range  and  angle  resolution 
capability  (see  Fig.  1).  A typical  space  radar  viewing  low  flying  targets 
against  the  earth's  background  will  encounter  50-60  dBsm  of  clutter  radar 
cross  section  (RCS). 


This  work  was  supported,  in  part,  by  Rome  Air  Development  Center  under 
Contract  F30602-77-C-0231  and  F30602-75-C-0122. 
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The  traditional  approach  to  this  problem  is  to  separate  the  targets  from 
this  clutter  by  exploiting  the  Doppler  shifted  frequency  of  a return  from  a 
moving  target.  The  spectral  characteristics  of  the  radar  return  and  the  abil- 
ity to  measure  this  spectrum  determine  the  capability  of  the  look-down  radar. 

Processing  the  radar  return  is  an  obvious  area  where  advanced  spectral 
estimation  techniques  may  prove  useful.  To  be  an  effective  estimation  proce- 
dure in  this  application,  the  method  must  provide  a high  resolution  spectral 
estimate,  but  must  also  possess  several  properties  not  always  present  in  the 
newer  spectral  estimation  techniques.  The  techniques  must  accommodate  a 
noisy,  multiple  target  environment.  The  technique  must  be  effective  when 
there  is  little  or  no  apriori  knowledge  about  the  target  frequency.  The 
range  component  of  the  target's  velocity  determines  the  Doppler  shift  and 
variations  in  speed  and  direction  of  flight  cause  large  variations  in  range- 
rate.  The  method  must  relatively  faithfully  reproduce  the  amplitudes  of  the 
frequency  components  over  a large  dynamic  range.  Unwanted  signals  can  easily 
exceed  the  desired  target  signals  by  factors  of  a 1000.  If  a relatively 
linear  response  is  not  maintained  over  this  dynamic  range,  then  various 
threshold  tests  applied  to  the  spectrum  to  separate  targets  from  noise  and 
clutter  will  not  be  effective. 

Figure  2 shows  a typical  clutter  spectrum  with  typical  targets.  This 
spectrum  is  characterized  by  a strong  central  response  coming  from  clutter 
illuminated  by  the  radar  mainbeam.  Additional  ground  clutter  components  come 
in  through  the  antenna  sidelobes  which  illuminate  a large  portion  of  the 
earth's  surface.  In  addition  there  are  imperfections  in  the  radar  transmit- 
ter and  receiver  hardware  which  introduce  unwanted  signals  and  a noise-like 
clutter  contribution.  The  result  is  a relatively  white  clutter  floor  which 
extends  over  the  frequency  interval  of  interest  and  is  proportional  to  the 
total  clutter  signal . 

The  separation  of  the  targets  from  the  clutter  requires  a relatively 
narrow-band  Doppler  filter.  This  in  turn  requires  a relatively  long  sample 
of  the  signal.  The  transmission  and  reception  of  this  long  time  sample 
occupies  the  radars  time  and  is  not  compatible  with  high  search  rates  and 
limits  radar  capability.  Reduced  dwell  times  result  in  lower  signal -to-in- 
terference  ratios  and  poorer  detection  statistics.  Any  spectral  estimation 
procedure  which  can  improve  the  effective  signal -to-interference  ratio  or 
the  frequency  resolution  capability  of  the  radar  while  limiting  the  dwell 
time  would  be  extremely  valuable. 

With  this  in  mind,  consider  two  example  problems  where  ultimate  perfor- 
mance is  restricted  by  a dwell  time  constraint.  The  first  example  concerns 
itself  with  targets  such  as  "target  a"  in  Fig.  2.  Here  the  viewing  geometry 
and  target's  velocity  are  such  that  it  competes  only  with  the  noise-like  clut- 
ter floor.  A pulse-burst  waveform  is  transmitted.  Traditionally  the 
receiver  processes  the  return  using  a discrete  Fourier  transform  and  an 
amplitude  taper  (i.e.  window).  This  target  is  isolated  from  the  major 
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FIGURE  2.  Typical  Spectrum  with  Targets  and  Clutter 
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clutter  response  and  the  traditional  FFT  will  detect  this  target.  The 
detection  statistics  will  be  determined  by  the  signal-to-interference  ratio 
which  is  proportional  to  the  dwell  time  and  inversely  proportional  to  the 
effective  filter  resolution.  If  more  advanced  spectral  estimation  techniques 
can  narrow  the  filter  then  they  should  increase  the  signal -to-interference 
ratio  and  improve  the  detection  statistics. 

This  second  example  concerns  those  targets  such  as  "target  b"  in  Fig.  2. 
Here  the  target  is  visible  against  the  true  spectrum  of  background  clutter 
but  is  close  enough  in  Doppler  to  the  main  clutter  response  that  a fairly 
high  resolution  with  large  dynamic  range  is  required.  The  required  resolu- 
tion will  require  an  unacceptably  long  dwell  time.  In  this  example  the  same 
waveform  is  used.  This  problem  stresses  the  ability  to  resolve  a small  sig- 
nal from  a much  larger,  unwanted  signal  as  contrasted  with  the  first  example 
where  the  ability  to  improve  the  effective  signal-to-interference  ratio  was 
of  interest.  Radar  performance  in  these  two  detection  scenario  is  limited 
by  the  effective  resolution  of  the  spectral  estimation  process.  The  remain- 
der of  this  paper  is  devoted  to  the  examination  of  whether  or  not  performance 
may  be  improved  by  making  better  use  of  signal  processing,  specifically 
spectral  estimation  techniques. 

Spectral  Estimation  Techniques 

By  definition,  the  power  spectrum  of  a signal  s(t)  is  given  by  the 
square  of  its  Fourier  transform 


oo  2 

P(f)  = [ / S(t)  e-27rjft  dt]  (1) 

L -oo 

This  expression  is  simple  enough  and  there  exists  a very  efficient  algorithm 
for  estimating  p(f)  from  discrete  samples--namely  the  famous  fast  Fourier 
transform  or  FFT.  The  problem  of  course  comes  about  when  only  a short  seg- 
ment of  S(t)  is  known.  The  conventional  approach  is  to  assume  S(t)  is  zero 
outside  the  region  where  it  has  been  sampled  or  that  it  repeats  periodically 
and  to  apply  Eqn.  (1)  using  the  FFT  or  perhaps  a windowed  FFT.  In  either 
case  the  expected  value  of  the  result  is  the  true  spectrum  convolved  with  the 
spectrum  of  the  window.  The  latter  gives  rise  to  the  familiar  diffraction 
limit,  which  says  that  resolution  of  order  less  than  T~1  are  not  obtained. 

It  is  well-known  that  the  optimum 'filter  for  a bandlimited  signal  in 
white  gaussian  noise  is  the  matched  filter.  When  the  desired  signal  is  a 
Doppler-shifted  replica  of  the  original,  then  the  Fourier  transform  evaluated 
at  the  true  Doppler  frequency  acts  as  a matched  filter.  However,  in  practice 
background  clutter  is  not  white  and  in  fact  a weighted  (tapered)  DFT  is  often 
superior  to  the  ordinary  DFT  or  matched  filter.  Our  results  indicate  that 
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linear  prediction  and  maximum  entropy  may  be  good  candidates  to  replace  the 
weighted  DFT  or  perhaps  even  the  matched  filter. 

In  recent  years,  alternatives  to  the  windowed  transform  approach  have 
been  forwarded  which  do  not  require  the  unrealistic  assumption  that  unknown 
data  equals  zero. [1-3]  Of  these,  the  most  appealing  for  the  above  detection 
problem  are  those  for  which  no  additional  a priori  information,  such  as  a 
restricted  frequency  band,  are  required. 

The  two  techniques  singled  out  for  this  feasibility  exercise  are 

1.  maximum  entropy  techniques 

2.  linear  prediction  techniques. 

Bowling  [4]  has  shown  that  both  of  these  approaches  are  superior  to  the  con- 
ventional one  in  (a)  resolving  two  closely  space  signals  and  (b)  estimating 
the  true  frequency  of  truncated  time  signals. 

Those  interested  in  the  mathematical  details  of  the  techniques  are  re- 
ferred to  the  work  of  Bowling  and  others. [1,2]  Suffice  it  to  say  here  that 

a.  The  maximum  entropy  method  produces  a power 
spectrum  which  is  consistent  with  existing 
data  and  is  maximally  noncommital  about 
missing  data,  and 

b.  The  linear  extrapolation  method  assumes  that 
the  missing  data  can  be  predicted  by  auto- 
regressive techniques  and  that  the  power 
spectrum  can  be  estimated  by  Fourier  trans- 
forming the  extrapolated  time  sequence. 

These  techniques  are  applied  to  the  two  sample  problems  in  the  following 
section. 

Numerical  Examples 

Two  representative  clutter  spectra  are  considered, 

a.  white  noise 

b.  a large  Gaussian  spectrum  with  a floor  of 
white  noise. 

Case  (b)  is  the  more  typical  of  a look-down  surveillance  scenario,  while 
case  (a)  is  appropriate  if  the  signal  is  passed  through  a whitening  filter 
designed  to  output  white  noise  if  the  input  is  similar  to  case  (b). 
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The  procedure  followed  was  to  select  the  desired  power  spectrum,  gener- 
ate a signal  with  that  spectrum,  then  add  a target  of  a specified  strength 
and  frequency.  The  signal  was  then  inverse  transformed  using  a 256  point  FFT. 
The  first  64  points  were  used  as  the  known  data  and  the  rest  were  discarded. 
These  64  points  were  then  spectrum  analyzed  by  three  methods: 

1.  64-point  FFT 

2.  Linear  prediction  (50  lags,  extrapolated 
to  256  points) 

3.  Maximum  Entropy  (50  lags). 

The  results  consisted  of  plots  of  the  power  spectra  and  a determination 
of  the  signal -to-mean  noise  and  the  signal -to-peak  noise. 

Figures  3-7  present  typical  results.  Table  I summarizes  the  results  for 
five  different  cases  considered.  The  signal -to-mean  clutter  ratio  (S/C)  and 
the  signal -to-peak  clutter  (S/Cm)  are  presented  for  the  three  spectral  esti- 
mation techniques  considered.  Each  result  is  for  a single  run. 

It  is  appropriate  to  say  a bit  more  about  how  the  various  techniques 
should  be  judged.  Usually  the  signal  processor  can  be  thought  of  as  a bank 
of  filters.  The  output  of  these  filters  are  then  used  to  make  decisions 
about  whether  targets  of  interest  are  present,  how  many  targets,  and  what 
are  their  positions  and  velocities,  etc.  In  order  to  do  so,  one  must  know 
the  probability  distribution  of  the  filter  output  for  targets  plus  inter- 
ference and  for  interference  alone.  These  distributions  determine  the  detec- 
tion probability  for  a given  false  alarm  probability  or  vice  versa.  Natural- 
ly the  techniques  which  yield  the  higher  detection  probability  are  more 
desirable. 

There  are  two  ways  of  assessing  the  detection  and  false  alarm  probabil- 
ities. One  is  by  deriving  mathematical  expressions  for  the  probability 
distributions  for  signal  plus  interference  and  interference  alone.  The  other 
is  to  perform  experiments  and  gather  statistics.  These  would  then  be  used 
to  infer  the  underlying  detection  statistics  and  overall  merits  of  the 
various  techniques. 

The  approach  taken  here  is  to  perform  experiments  using  computer  gener- 
ated targets  and  interference.  Insufficient  results  are  available  at  this 
time,  however,  to  infer  any  statistics.  Therefore  we  shall  merely  point  out 
some  general  features  which  consistently  appear  in  every  case  run,  pose  some 
unanswered  questions,  and  offer  a few  hypotheses. 

Figure  3 shows  two  original  spectra  (256  points)  with  targets  and  two 
types  of  clutter,  colored  (example  1)  and  white  (example  2).  The  signal-to- 
mean  clutter  level  in  each  case  is  16  dB.  In  calculating  the  mean  clutter, 
the  mainlobe  is  not  considered  in  the  colored  cases.  Figure  4 shows  the 
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TABLE  I 


TABLE  OF  RESULTS  SHOWING 
SPECTRAL  ESTIMATION  PERFORMANCE  FOR  FIVE  CASES 


CLUTTER  TARGET  FFT  LINEAR  MAXIMUM 

TAPE  DOPPLER  * T$  EXTRAPOLATION  ENTROPHY 


S/C 

S/CM 

S/C 

$/CM 

S/C 

S/CM 

COLORED 

.4 

2.4 

.35 

26.8 

1.5 

24.2 

1.6 

WHITE 

.3 

6.1 

1.3 

36.5 

2.0 

17.1 

3.7 

COLORED 

.3 

9.2 

1.4 

25.6 

1.8 

44.8 

2.5 

WHITE 

.45 

7.3 

1.2 

21.0 

1.6 

11.1 

1.3 

COLORED 

.5 

n 

1.3 

38.2 

3.16 

118.0 

3.0 

T$  = sampling  period 
CM  = clutter  peak  (excluding  mainlobe) 


£ = mean  clutter  (excluding  mainlobe) 

S = target  RCS 

The  above  refer  to  the  clutter  content  per 
frequency  sample. 
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(8P)  d3M0d  3AUV13« 


FIGURE  3.  Two  Examples  of  True  Target  and  Clutter  Spectra: 
(1)  Colored;  (2)  White 
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(ap)  d3M0d  3AIiV33d 

FIGURE  4.  Two  Examples  of  Spectral  Estimates 

Comparing  the  DFT  and  Linear  Prediction  Methods 
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RELATIVE  DOPPLER  FREQUENCY 


resulting  spectra  when  the  64-point  OFT  and  the  linear-prediction  techniques 
are  used.  The  scales  are  adjusted  so  that  the  targets  are  at  exactly  the 
same  intensity.  This  permits  the  clutter  peaks  to  be  compared.  In  both 
cases  the  linear-prediction  spectrum  is  entirely  below  that  of  the  DFT.  The 
mean  clutter  level  is  well  below  that  for  the  DFT  and  the  peaks  are  slightly 
lower.  Figure  5 compares  the  DFT  spectrum  and  a maximum  entropy  spectrum 
overlaid  in  the  same  manner.  Again  the  maximum  entropy  clutter  spectrum  lies 
well  below  that  of  the  DFT.  It  would  be  wrong  to  infer  that  this  improve- 
ment over  the  DFT  will  always  exist  however  since  this  is  only  one  statisti- 
cal sample  of  the  clutter. 

Figure  6 shows  the  results  using  a second  clutter  sample.  In  this  sam- 
ple the  linear  prediction  method  does  not  do  as  well.  It  created  several 
clutter  peaks  which  are  higher  than  those  generated  in  the  DFT.  This  points 
out  that  the  problem  is  highly  statistical  and  requires  a more  extensive 
analysis.  Figure  7 shows  the  maximum  entropy  spectrum  for  the  same  case 
shown  in  Fig.  6 and  again  several  noise  frequencies  are  enhanced.  Not  shown 
but  included  in  the  summary  table  is  a case  which  was  identical  except  for 
the  random  numbers  used  to  generate  the  true  spectrum.  There  both  the 
maximum  entropy  and  linear  prediction  method  were  more  successful  than  the 
DFT  in  suppressing  the  clutter  peaks  relative  to  the  target. 

The  results  seem  to  exhibit  the  following  properties: 

1.  Both  the  linear  prediction  method  and 
the  maximum  entropy  techniques  seem  to 
select  and  accentuate  the  strongest 
frequencies  which  appear  in  the  64- 
point  DFT  spectrum.  Except  for  the 
target  and  the  mainlobe  clutter,  if  it 
exists,  there  appears  to  be  little 
correlation  between  the  strong  fre- 
quencies in  the  true  spectrum  (Fig.  3) 
and  those  of  linear  prediction  or 
maximum  entropy. 

2.  There  is  a very  high  correlation  between 
the  peaks  of  the  linear  prediction  and 
maximum  entropy  spectra. 

3.  In  most  cases  both  the  linear  prediction 
and  the  maximum  entropy  techniques  sig- 
nificantly improve  the  signal -to-mean 
clutter  ratio  and  provide  a modest  im- 
provement in  signal -to-peak  clutter 
ratio  when  compared  with  the  64-point  DFT 

4.  The  clutter  spectrum  shape  does  not  pro- 
vide any  noticeable  difference  in  the 
results. 
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FIGURE  5.  Example  of  A Spectral  Estimate 

Comparing  the  DFT  and  Maximum  Entropy  Methods 
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(9P)  H3M0d  3AUV13« 


FIGURE  7.  Comparison  of  A DFT  and  Maximum 
Entropy  Spectra’  Estimation 
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Of  the  above  comments,  the  third  is  most  noteworthy.  If  the  advanced  spec- 
tral estimation  techniques  can  provide  an  effective  increase  in  the  signal- 
to-clutter  ratio,  then  they  would  be  useful  signal  processing  approaches  in 
detection  applications. 

There  are  some  unanswered  questions.  The  first  question  is  whether 
there  is  a real  improvement  in  detection  statistics.  Statistical  distri- 
butions for  the  clutter  after  processing  will  be  required  to  answer  this 
question.  Are  there  better  spectral  estimation  techniques  than  the  one  con- 
sidered here?  What  is  the  impact  of  the  number  of  expansion  coefficients  on 
detection  performance?  How  far  can  the  signal  be  expanded  and  still  get  im- 
proved performance?  What  is  the  impact  of  signal-to-clutter  ratio  before 
processing  on  the  utility  of  these  techniques? 

Conclusions 

As  is  typical  of  most  preliminary  exercises,  this  one  has  generated  more 
questions  than  answers.  However,  the  results  indicate  some  improvement  is 
possible  and  should  encourage  further  study.  Taking  these  results  at  face 
value  one  can  conclude  that  there  is  a possible  improvement  of  at  least  one 
dB  signal  to  clutter  (based  on  the  signal -to-maximum  clutter  ratio)  using 
either  the  linear  prediction  method  or  the  maximum  entropy  method. 

These  results  aV’e  all  the  more  positive  when  one  considers  that  in 
practice  a tapered  window  such  as  Hamming  or  Chebyshev  would  be  used  in  con- 
junction with  the  DFT  to  suppress  sidelobes.  A tapered  window  always  de- 
creases the  effective  resolution  and  the  effective  dwell  time  for  clutter 
suppression  purposes.  Therefore  any  method  which  does  as  well  as  the  un- 
tapered DFT  and  does  not  produce  a high  sidelobe  level  will  be  superior  to 
the  tapered  DFT. 
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Abstract 


Maximum  entropy  spectral  analysis  (MESA)  offers  some  advantages  for 
imaging  discrete  targets  when  a data  set  is  too  short  for  conventional 
Fourier  procedures  to  resolve  important  spectral  features.  Radar  measure- 
ments can,  for  example,  provide  estimates  of  the  length,  cross-range 
dimension,  and  spin  rate  of  an  object. 

This  paper  first  derives  the  specifications  for  radar  waveforms  with 
which  target  features  can  be  estimated  satisfactorily  using  standard  Fourier 
techniques.  If  the  waveform  is  not  optimal,  then  MESA  may  help  to  compensate 
for  the  loss  in  resolution  otherwise  sustained  in  conventional  processing. 

1.  Radar  Imaging  Principles 

Imaging  with  standard  techniques  of  spectral  analysis  has  been  discussed 
in  the  literature,  although  the  use  of  alternative  spectral  techniques  for 
this  purpose  is  a recent  application  [1]. 

Sizing  a radar  target  in  cross-range  relies  upon  detecting  those 
frequency  components  present  in  a coherent  burst  of  pulses  that  are  corre- 
lated with  the  motion  of  scattering  centers  relative  to  the  target's  center 
of  motion.  For  example,  if  a range  resolution  cell  contains  N scattering 
centers,  then  the  total  radar  return  from  that  cell  is  the  coherent  sum  of  N 
individual  contributions 


N 

Aexp[i<fi(t)  ] = £ A^expfitfi^ (t)  ] 

n=l 


(1) 
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where  A and  <J>  (t)  are  the  amplitudes  and  phases  of  the  N scattering  centers. 

The  phase  <t>  (t?  is  simply 
n 


<t>  (t)  = 4TTR  (t)/A  (2) 

n n 


where  R (t)  is  the  one-way  range  to  the  nth  scatterer,  and  X is  the  radar 
wavelength.  If  the  scatterer  is  spinning  about  an  axis,  then  the  range 
changes  in  time  to  produce  a variation  in  phase  that  appears  as  a sinusoidal 
component  in  a series  of  consecutive  samples  (pulses)  of  Equation  (1) . (We 
have  assumed  any  accelerations  are  negligible  during  the  observation  period, 
although  such  could  be  detected  and  accounted  for.)  If  we  write  the  range  as 


R (t) 
n 


v t 
n 


+ R 


no 


(3) 


we  find  that  the  frequency  component  produced  by  the  motion  of  the  nth 
scatterer  is  simply 


f = 2v  /X  (4) 

n n 


where  v is  the  velocity  of  the  scatterer  during  the  observation  interval. 
If  the  angular  spin  rate  u,  (rad/sec)  is  known,  then  the  distance  of  the 

scatterer  from  the  spin  axfl^as  viewed  at  an  aspect  angle  ft,  is 


r 

n 


v / (to  . sinft) 
n spin 


(5) 


and  a cross-range  dimension  in  a metric  unit  can  be  estimated. 


Several  basic  signal  processing  considerations  must  be  satisfied  in 

order  to  obtain  good  estimates  of  the  moment  arms  r : 

n 

a.  the  data  must  not  be  aliased,  such  that  the  sampling 
rate  or  pulse  repetition  frequency  (PRF)  must  be 
chosen  to  accommodate  the  highest  frequency  (or 
maximum  velocity)  expected  to  be  encountered; 

b.  the  length  of  the  observation  period  T must  be 

long  enough  to  resolve  each  frequency  Component  f ; 

n 

c.  the  observation  period  T must  not  be  so  long  that 
that  non-linearities  (accelerations)  cause  the  fre- 
quency spectrum  to  smear  because  of  time 
non-stationarities. 
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These  three  requirements  will  determine  the  radar  waveform,  initially  de- 
signed according  to  Fourier  principles. 

The  Nyquist  interval  spans  ± l/26t  (minus  to  plus  the  Nyquist  frequency) 
and  converts  by  Equation  (4)  into  a velocity  interval  spanning  ± A/46t  (the 
ambiguous  velocity  interval);  6t  is  the  sample  spacing.  Therefore,  the  mag- 
nitude of  the  largest  velocity  must  not  exceed  A/46t  in  order  to  avoid 
aliasing.  If  L is  the  largest  dimension  of  the  target,  then  we  calculate, 
since  PRF  = l/6t, 

A/46t  = ui  . Lsinfi  , or 
spin 


PRF  = 4io  LsinJVA 
min  spin 


(6) 


to  be  the  minimum  pulse  repetition  frequency  that  prevents  aliasing. 


During  the  processing  interval  T , no  scattering  center  may  change  ve- 
locity resolution  cells;  otherwise,  tfie  spectral  lines  will  be  smeared  across 
several  resolution  cells.  Relative  to  the  center  of  mass,  the  largest  possi- 
ble change  in  velocity  occurs  as  the  scattering  center  spins  through  zero 
velocity  along  the  line  of  sight; 


Av 

max 


2to  . Lsinfisin(aj  . T /2) 
spin  spin  p 


(7) 


which  we  require  to  be  confined  within  the  velocity  resolution  cell 


6v  = X/2T 
res  p 


(8) 


Setting  Av  = 6v  and  approximating  sin(x)  =s  x,  we  obtain 
max  res 


T = 
P 


1/2 


[_2(jj^  Lsinfi 
spin 


(9) 


as  the  optimum  length  of  the  observation  period  (i.e.,  burst  length).  This 
gives  a velocity  resolution  from  Equation  (8)  of 


6V 


res 


A 


(oo 


spin 


A Lsinf2)/2  ) 


and  a minimum  number  of  samples  (pulses)  during  T of 


(10) 
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N . 
min 


(11) 


= (PRF  . )T  = /8LsinjyA 
min  p 


2.  Practical  Examples 

Approximate  knowledge  of  target  parameters  (L,  to  . ) and  of  the  obser- 
vation geometry  (ft)  helps  in  designing  the  radar  wave?8rm.  The  minimum  PRF 
is  calculated  by  Equation  (6) , and  the  temporal  length  of  the  burst  is  calcu- 
lated by  Equation  (9) . Examples  are  easily  simulated.  Imagine  a spinning 
sphere  with  six  scattering  centers  symmetrically  spaced  around  the  equator. 
Observed  at  an  angle  of  ft  = 30  deg  from  the  pole,  the  spinning  scattering 
centers  individually  contribute  to  the  Doppler  spectrum  of  the  target.  At 
any  time,  at  least  three  scatterers  are  visible  and  hence  at  least  three 
frequencies  are  present.  Figure  1 demonstrates  (a)  the  aliasing  produced  by 
too  low  a PRF,  (b)  the  lack  of  resolution  because  T is  too  short,  (c)  cor- 
rect choices  of  both  PRF  and  T , and  (d)  the  smearifig  caused  by  too  long  a 
T . The  spinning  motion  is  apparent  in  Figure  1-C  to  the  extent  that  the 
spin  period  can  be  estimated  by  timing  the  appearance  and  disappearance  of  a 
scatterer  as  it  becomes  visible  and  is  then  shadowed. 

Even  if  the  waveform  parameters  are  chosen  properly.  Equation  (10)  shows 
that  the  velocity  resolution  ultimately  depends  on  the  radar  wavelength. 

Using  too  large  a wavelengtn  can  thwart  the  most  carefully  designed  imaging 
waveform.  An  estimate  of  the  velocity  resolution  necessary  to  image  the 
target  is  useful  for  selecting  the  wavelength. 

The  radar  waveform  specifications  described  above  will  provide  satis- 
factory resolution  upon  Fourier  processing.  In  the  event  that  the  waveform 
is  not  optimal  for  Fourier  processing  (burst  length  too  short;  wavelength  too 
long) , maximum  entropy  may  help  to  compensate  for  the  loss  of  resolution. 
Figure  2 compares  conventional  spectra  and  MESA  spectra  when  the  processing 
interval  is  too  short  to  resolve  the  frequency  components  produced  by  the 
spinning  scatterers  (cf . , Figure  1-B) . It  is  evident  that  MESA  outperforms 
the  Fourier  transform  in  resolving  the  individual  frequencies. 

Figure  3 shows  field  data  taken  by  a radar  on  a spinning  object.  In  this 
case,  both  the  wavelength  and  burst  length  were  too  short  for  Fourier  pro- 
cedures to  resolve  fully  the  spectral  lines  that  MESA  begins  to  perceive. 
Indeed,  there  are  common  features  in  the  simulations  (Figures  1-C  and  2-B) 
and  in  the  field  data  (Figure  3-B)  indicative  of  the  motion  of  scattering 
centers  as  they  spin  into  and  out  of  view. 

In  summary,  MESA  offers  definite  advantages  in  radar  imaging  applica- 
tions when  Fourier  processing  of  a burst  waveform  fails  to  provide  sufficient 
cross-range  resolution.  A combination  of  linear  predictive  and  Fourier 
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techniques,  in  which  a data  set  is  extended  by  a prediction  filter  before 
Fourier  transformation  rather  than  being  padded  with  zeroes,  offers  an  accept- 
able compromise  that  combines  the  best  features  of  both  [l]. 
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FIGURE  2(a,b).  Fourier  techniques  in  (a)  do  not  fully  resolve  the 
scattering  centers  on  a spinning  sphere  which  MESA  (b)  resolves. 
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Abstract 

Maximum  entropy  frequency  analysis  (MEM) , one  of  a number  of  autoregres- 
sive techniques,  can  be  used  to  improve  the  resolution  of  synthetic  aperture 
radar.  Limited  by  Fourier  transform  theory,  current  SAR  processing  procedures 
produce  relatively  large  mainlobes  and  significant  sidelobes.  MEM  produces 
narrow  mainlobes  and  negligible  sidelobes  with  very  limited  data.  Therefore 
we  have  the  potential  of  achieving  improved  resolution  with  presently  used 
data  sets,  attaining  current  resolution  with  reduced  amounts  of  data,  and 
detecting  weak  signals  now  buried  under  sidelobes.  Problems  of  applying  MEM 
to  SAR  include  one-dimensionality,  longer  computing  time  than  the  FFT, 
spectral  splitting,  small  deviations  from  currect  amplitude,  and  non- 
linearity. 

Background 

Synthetic  Aperture  Radar  (SAR)  processing  is  currently  limited  in  reso- 
lution by  Fourier  transform  theory  [1] . Recording  of  the  linear  frequency 
modulated  range  and  Doppler  pulses  requires  frequency  analysis  to  obtain  an 
image  from  the  SAR  data.  The  half-power  width  of  the  Fourier  transform  main- 
lobe  determines  Rayleigh  resolution,  and  the  sidelobes  produced  by  the 
Fourier  transform  can  obscure  nearby  weak  targets.  SAR  images  are  produced 
with  the  data  limited  by  the  bandwidths  of  the  chirp  and  Doppler  signatures. 
As  a reduction  in  data  produces  an  increase  in  the  mainlobe  of  the  Fourier 
transform,  resolution  is  consequently  degraded  with  a reduced  amount  of  data. 
Hence,  SAR  resolution  is  dependent  on  the  amount  of  available  data.  Any 
alternative  method  of  processing  which  can  increase  resolution  or  achieve 
the  current  resolution  with  fewer  data  would  increase  the  usefulness  and 
possibly  extend  the  applications  of  SAR.  Reduction  or  elimination  of  side- 
lobes would  also  be  beneficial. 

The  maximum  entropy  frequency  analysis  method  [2,3]  (MEM,  also  called 
MESA)  can  be  used  to  produce  narrower  mainlobes  than  the  Fourier  transform 
on  limited  data  sets,  in  addition  to  virtually  eliminating  sidelobes.  We 
have  applied  MEM  algorithms  to  SAR  data  and  shown  that,  in  comparison  to 
Fourier  transform  processing,  finer  resolution  is  obtained  with  the  same 


data,  this  resolution  is  retained  with  fewer  data,  and  sidelobes  were 
virtually  eliminated  . MEM  is  related  to  a number  of  recently  developed 
methods  of  frequency  analysis. 

Synthetic  aperture  radar  is  primarily  used  for  producing  images  of 
terrain  and  cultural  objects  which  appear  much  like  aerial  photography  taken 
at  a low  sun  angle.  SAR  images  are  interpreted  visually,  so  that  the  tones, 
textures  and  patterns  are  the  predominant  information  desired.  The  precise 
magnitude  of  a return  is  seldom  measured,  as  the  current  SAR  systems  are 
uncalibrated . 


Potential  Improvement 

Each  of  the  three  areas  for  which  MEM  processing  offers  potential  would 
provide  improvement  in  the  amount  of  information  or  enable  more  extensive  or 
economical  systems: 

The  first  area  - improvement  in  resolution  - would  aid  in  the 
identification  of  smaller  objects.  For  one  dimension,  our  experiments 
have  indicated  an  improvement  factor  of  approximately  six  when  using 
MEM.  ) 

The  second  - retaining  the  current  resolution  with  less  data  - 
would  aid  in  the  gathering  of  information  from  satellites.  Data  trans- 
mission requirements  are  currently  excessive.  Indications  are  that  1/6 
to  1/7  of  data  for  conventionally  processing  one-dimensional  data  is 
required  by  MEM.  If  a two-dimensional  algorithm  is  developed,  the 
amount  of  data  transmission  might  be  reduced  to  1/49  of  that  now 
required. 

The  The  third  - detection  of  weakly  reflecting  objects  near  strong 
reflectors  - would  aid  in  detecting  many  objects  which  are  now  obscured 
by  sidelobes.  Our  experiments  on  radar  data  have  shown  that  sidelobes 
virtually  disappear  in  the  image,  both  from  point  reflectors  and  from 
extended  objects  with  clutter  [4]. 

Problems 


Although  experiments  have  shown  that  MEM  can  improve  SAR  image  process- 
in  these  three  areas,  several  problems  require  answers  before  application  to 
operational  systems.  These  problems  are  spectral  splitting,  moderate 
deviation  ( < 3 dB)  from  correct  amplitude,  the  lack  of  two-dimensional 
algorithms,  and  the  computational  time  required. 

At  times  MEM  processing  will  produce  two  closely  spaced  spectral  peaks 
where  the  data  has  only  one.  We  have  not  observed  this  "splitting"  on  the 
SAR  data  we  have  processed,  probably  because  the  split  peaks  are  very  close 
together.  However,  splitting  could  be  deleterious  to  the  image.  If  it  were 
known  under  what  conditions,  or  what  objects  produce  splitting,  then  the 
interpretive  procedure  could  accommodate  it.  Indeed,  the  splitting  itself 
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might  become  diagnostic.  Also,  development  on  the  theory  and  algorithm 
appears  to  have  eliminated  this  disadvantage  [4]. 

Moderate  deviations  from  correct  amplitudes  in  spectral  peaks  has  a 
minimal  effect  on  SAR.  SAR  produces  a photograph-like  image  for  visual 
inspection.  As  the  eye  is  a logarithmic  detector  and  is  primarily  concerned 
with  differences  in  brightness,  texture,  and  patterns,  the  moderate  devia- 
tions in  the  amplitudes  of  MEM  peaks  will  seldom  cause  visual  misinterpre- 
tation. 

Noise  is  more  destructive  to  MEM  computations  than  to  the  FFT.  The 
effect  of  noise  on  operational  use  of  MEM  for  SAR  will  require  investigation. 

A serious  disadvantage  is  the  current  lack  of  two-dimensional  algorithms, 
although  a development  of  the  theory  has  been  presented  [5].  Because  MEM 
does  not  retain  phase,  only  one  direction  can  be  processed  by  MEM.  The 
method  we  have  used  is  to  first  process  in  one  direction  with  the  fast 
Fourier  transform  (FFT)  and  subsequently  to  process  the  FFT  result  in  the 
perpendicular  direction  with  MEM.  In  this  way  either  the  azimuth  or  range 
directions  can  have  the  benefit  of  MEM  processing.  A cumbersome  method  is  to 
process  the  original  data  twice,  so  that  the  azimuth  data  is  enhanced  in  one 
image  and  the  range  data  in  the  other  image.  Super imposition  with  point-by- 
point multiplication  and  normalization  will  then  enhance  the  image  in  both 
the  azimuth  and  range  directions. 

Illustrations 

In  this  presentation  we  only  illustrate  the  processing  of  data  synthe- 
sized to  be  similar  to  actual  SAR  data.  "Point"  sources  Are  synthesized  and 
processed  by  MEM. 

A SAR-synthes^zed  "point"  source  was  processed  in  both  directions  by 
FFT.  The  intensity  modulated  result  on  a CRT  is  shown  in  the  upper  right 
of  Figure  1.  The  horizontal  and  vertical  sidelobes  are  evident.  In  the  left 
portion  of  Figure  1,  horizontal  and  vertical  intensities  are  graphed.  Note 
the  width  of  the  mainlobe  and  the  size  of  the  sidelobes. 

The  same  synthesized  data  as  used  for  Figure  1 was  then  processed  by 
the  FFT  in  the  vertical  direction,  and  by  the  MEM  algorithm  in  the  horizontal 
direction.  A similar  CRT  display  to  Figure  1 is  shown  in  Figure  2.  The 
vertical  graph  is  similar  to  those  shown  in  Figure  1,  with  wide  mainlobe  and 
significant  sidelobes.  The  MEM-processed  horizontal  direction  is  shown  in 
the  upper  left  portion  of  Figure  2.  The  sidelobes  have  disappeared,  and  the 
mainlobe  has  become  much  narrower,  particularly  at  and  near  the  half-power 
width  which  governs  resolution. 

Figure  3 shows  the  intensity-modulated  CRT  presentation  of  an  array  of 
"point"  reflectors  processed  from  synthetic  SAR  data  of  128  x 128  points  by 
the  FFT.  The  two  closely  spaced  reflectors  in  the  lower  right  are  resolved 
with  this  number  of  data.  Figure  4 shows  the  same  array  processed  from  1/4 
the  number  of  data  samples  (64  x 64).  Note  that  the  two  closely  spaced 


219 


reflectors  in  the  lower  right  are  no  longer  resolved  with  an  FFT  processing 
of  this  reduced  amount  of  data.  The  sample  spacing  is  the  same.  Contiguous 
samples  were  removed  to  reduce  the  data  set. 

When  this  reduced  amount  of  data  is  processed  by  the  MEM  algorithm,  the 
resolution  of  the  two  reflectors  is  maintained.  Figure  5 shows  the  intensity 
modulated  display  of  the  64  x 64  data  points  processed  by  FFT  in  the  vertical 
direction  and  by  MEM  in  the  horizontal.  The  upper  graph  in  Figure  5 is  a 
trace  across  the  three  lower  right  reflectors.  The  two  closely  spaced 
reflectors  are  clearly  resolved  by  MEM.  The  lower  graph  is  a vertical  trace 
across  a main  FFT  lobe  to  display  the  size  produced  with  the  data  reduced  to 
64  samples. 


Summary 


The  MEM  algorithm  has  been  shown  to  be  applicable  to  synthetic  aperture 
radar  data.  Better  resolution  than  that  found  with  the  Fourier  transform  is 
attained  with  MEM,  and  resolution  is  retained  with  a reduced  amount  of  data. 
Also  sidelobes  are  virtually  eliminated.  Problems  include  spectral  splitting, 
minor  deviations  from  true  values,  computing  time,  noise,  and  one-dimension- 
ality of  MEM. 
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FIGURE  1.  Upper  Right:  Intensity  Modulated  CRT  Display  of  FFT 
Processing  of  a Synthetic  "Point"  Target. 

Upper  Left:  Graph  of  Intensity  Along  Horizontal 
Segment . 

Lower  Left:  Graph  of  Intensity  Along  Vertical  Segment. 


FIGURE  2.  Upper  Right:  Intensity  Modulated  CRT  Display  of 

Horizontal  MEM  Processing  and  Vertical  FFT  Processing. 
Upper  Left:  Graph  of  Intensity  Along  Horizontal  Segment 
(Note  absence  of  sidelobes) . 

Lower  Left:  Graph  of  Intensity  Along  Vertical  Segment. 
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FIGURE  3.  Intensity  Modulated  CRT  Display.  FFT  Processing  of 

128  x 128  Data  Samples  of  Synthesized  Array  of  "Point" 
Reflectors.  Note  Closely  Spaced  Pair  of  Reflectors  in 
Lower  Right. 
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FIGURE  4.  Intensity  Modulated  CRT  Display  of  FFT  Processing  ot 
64  x 64  Data  Samples  of  Synthesized  Array  of  "Point" 
Reflectors.  Contiguous  Samples  Removed  from  128  x 128 
Data  Array  Used  for  Result  Shown  in  Figure  3.  Note 
Loss  of  Resolution  in  Lower  Right. 
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FIGURE  5.  Intensity  Modulated  CRT  Display  of  Horizontal  MEM 
Processing  and  Vertical  FFT  Processing  of  64  x 64 
Data  Samples.  Resolution  is  Retained  Between  Lower 
Right  Reflector  Pair.  Upper  Graph  Shows  Intensity 
Trace  Across  Three  Lower  Right  Reflectors.  Lower 
Graph  Shows  Vertical  Intensity  Trace  Across  Lower 
Right  Reflector. 
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Abstract 


Antenna  patterns  computed  with  the  technique  of  Maximum  Entropy  offer 
improved  detection,  beamwidth,  and  resolution.  Example  patterns  are  computed 
for  omnidirectional  spatial  noise  and  complex  receiver  noise  using  8 and  16 
element  one-dimensional  antenna  arrays. 


Introduction 


The  Maximum  Entropy  Spectral  Analysis  (MESA)  Technique  as  developed  by 
John  P.  Burg  [11  is  applied  to  simulated  spatial  data  sampled  by  M uniformly 
spaced  elements  of  a one-dimensional  antenna  array.  A spatial  data  series 
is  processed  with  the  MESA  technique  in  the  same  way  time  series  data  is 
processed.  However  it  is  convenient  to  adopt  complex  number  notation  in  rep- 
resenting signals  incident  to  individual  antenna  elements,  and  consequently 
it  is  helpful  to  utilize  a complex  MESA  algorithm  having  complex  prediction 
error  coefficients.  The  complex  formulation  of  the  Maximum  Entropy  has  been 
described  by  Smylie,  Clark  and  Ulrych  [2]  and  Barnard  [3], 

Signals  incident  to  the  antenna  are  assumed  to  be  spatially  coherent 
plane  waves  with  real  and  imaginary  sinusoidal  components.  Spatial  noise  is 
assumed  to  be  omnidirectional  (or  white  noise)  and  is  represented  by  a 
Gaussian  amplitude  distribution  at  each  antenna  element.  However  in  some 
examples  receiver  noise  is  simulated  by  a random  complex  number  in  each  of 
the  N channels.  Receiver  noise  is  the  most  troublesome,  since  it  is  a complex 
vector  and  its  presence  distorts  both  signal  phase  and  amplitude. 


This  work  supported  in  part  by  Rome  Air  Development  Center,  Contract  No. 
F30602-75-C-0121  and  Naval  Research  Laboratory,  excepted  appointment. 
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Discussion 


The  expression  for  wavenumber  power  spectra  is  given  by  equation  (1)  in 
Figure  1,  where  it  is  noted  that  the  prediction  filter  /N  is  a whitening  filter 
such  that  the  prediction  error  power  PN  is  a constant.  n There  are  N filter 
coefficients  where  N is  less  than  the  number  of  data  samples.  The  wavenumber 
component  k is  taken  in  the  direction  of  the  linear  antenna  so  that  it  is 
a function  of  signal  angle  0 . The  antenna  elements  are  spaced  at  half 
wavelength  intervals.  Ax  . 

N 

The  error  power  and  the  filter  coefficients  y are  given  by  the 

recursive  relations  denoted  in  Equations  (2)  and  (3) . The  last  (N  + 1)  filter 

coefficient  is  a function  of  the  backward  and  forward  prediction  errors  given 
N „N  N N 

by  and  8^  respectively.  The  prediction  errors  d and  6 are 

computed  directly  from  the  data  samples.  The  set  of  equations  in  Figure  1 

comprise  the  Burg  technique  which  was  introduced  by  Burg  [4]  in  1968. 

A single  signal  incident  at  -3  degrees  to  the  normal  of  an  8 element 
antenna  is  shown  detected  in  Figure  2 by  MESA  for  a (S/N)  value  of  -5dB. 

For  this  and  other  examples  (except  where  denoted)  omnidirectional  spatial 
noise  is  considered  to  be  the  dominant  noise  component.  The  signal  is  detec- 
ted in  a MESA  snapshot  and  in  an  average  of  20  such  snapshots.  A snapshot  repre- 
sents data  which  is  recorded  at  one  instant  of  time  and  processed  with  MESA. 
Snapshot  patterns  vary  significantly  when  processing  data  recorded  with  short 
antennas.  Consequently  many  patterns  should  be  averaged  to  obtain  a stable 
representative  antenna  pattern.  Usually  an  average  of  10  to  20  patterns 
provides  good  stability.  Single  signals  are  readily  detected  in  omnidirec- 
tional noise  even  at  low  (S/N)  values  as  indicated  by  Figure  2.  Also  the 
beamwidth  and  accuracy  of  a single  MESA  peak  are  quite  good  in  omnidirectional 
noise. 

In  each  figure  the  number  of  antenna  elements  NE,  filter  size  N,  and 
the  random  number  generator  seed  no.  IR  are  indicated  as  shown  in  Figure  2. 

The  problem  of  beam  splitting  is  illustrated  in  Figure  3 where  three 
signals  are  present  at  angles  of  -7,  0,  and  7 degrees.  The  central  signal  at 
0 degrees  is  split  in  the  single  MESA  snapshot.  However  the  same  three  signals 
are  observed  more  accurately  as  three  peaks  in  Figure  4 where  20  such  snap- 
shots are  averaged.  There  is  an  apparent  interaction  between  the  three 
closely  spaced  signals  as  indicated  by  the  shallow  nulls  between  signals,  and 
such  interactions  serve  to  limit  the  resolution  possible  with  MESA. 

Individual  MESA  snapshots  of  three  signals  located  at  -7,0,  and  7 degrees 
are  shown  in  Figure  5 for  (S/N)  values  of  13  dB  and  40  dB  in  omnidirectional 
noise.  Resolution  could  not  be  achieved  at  lower  (S/N)  values  or  for  signals 
spaced  closer  together.  Of  course  the  three  signals  spaced  7 degrees  apart 
are  not  resolved  with  conventional  phased  array  beam  summation  as  indicated 
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by  the  conventional  pattern  also  shown  in  Figure  5. 

The  same  three  signals  are  shown  just  resolved  in  the  MESA  snapshot  of 
Figure  6 for  a (S/N)  of  20  dB  in  complex  receiver  noise.  In  the  same  figure 
better  resolution  is  indicated  for  a (S/N)  value  of  40  dB. 

A MESA  snapshot  of  five  mixed  signals  of  the  same  (zero)  initial  phase 
is  shown  in  Figure  7 for  dominant  omnidirectional  noise.  The  (S/N)  values  of 
each  signal  are  indicated  to  vary  from  0 to  20  dB.  The  stronger  signals  are 
readily  detected  whereas  the  weaker  signal  is  observed  only  a few  dB's  above 
the  MESA  side  peaks.  All  signal  peaks  are  estimated  by  MESA  within  one  degree 
of  the  true  angle  of  incidence.  HoweVer  it  is  noted  that  the  signal  at  0 
degrees  is  split  into  a strong  and  weak  component.  One  remarkable  character- 
istic of  MESA  is  that  the  narrow  beamwidth  is  maintained  even  at  angles  near 
90  degrees. 

All  five  signals  are  defined  with  even  more  precision  in  an  average  of 
20  MESA  snapshots  as  shown  in  Figure  8.  Both  precision  and  detectability 
are  improved  significantly  by  averaging.  The  split  peak  near  0 degrees  is  re- 
duced to  a very  narrow  single  peak  with  pattern  averaging.  The  conventional 
antenna  pattern  is  shown  for  comparison,  and  it  is  ovserved  that  a signal 
beamwidth  is  considerably  narrower  in  patterns  computed  with  MESA.  The 
weaker  signal  at  30  degrees  is  located  more  precisely  with  MESA,  and  of  course 
the  strong  signal  at  80  degrees  is  very  poorly  defined  by  the  conventional 
pattern . 

Phase  shifts  are  investigated  in  the  following  three  figures.  Fougere 
[5]  and  Ulrych  and  Clayton  [6]  have  noted  that  signal  distortion  occurs  with 
use  of  MESA  and  the  Burg  technique  when  signals  have  a non-zero  initial  phase. 
And  such  distortion  is  most  severe  for  a phase  of  90  degrees.  Phase  shift 
distortion  is  demonstrated  using  the  same  five  signals  of  the  previous 
examples,  where  the  strong  signal  at  60  degrees  is  given  an  initial  phase  of 
90  degrees.  The  computed  MESA  antenna  pattern  snapshot  is  shown  in  Figure  9a 
where  it  is  observed  that  the  signals  at  -45,  30,  and  60  degrees  have  all 
split  into  two  components.  However  such  splitting  is  virtually  eliminated, 
and  the  stronger  signals  are  more  precisely  defined  in  an  average  of  20  such 
MESA  snapshots  shown  in  Figure  9b.  However  the  weaker  signal  at  30  degrees 
does  not  show  much  improvement  in  the  averaged  MESA  pattern. 

The  same  five  signals  are  again  detected  with  an  averaged  MESA  antenna 
pattern  shown  in  Figure  10,  but  the  signal  at  60  degrees  has  an  initial  phase 
of  180  degrees.  Peaks  are  more  severely  distorted  and  broadened  than  in  the 
previous  example,  and  some  peak  splitting  remains  even  after  pattern  averaging. 

The  effect  of  signal  phase  upon  the  resolution  of  two  closely  spaced 
signals  is  observed  in  the  following  two  figures.  Two  13  dB  signals  located 
at  -3  and  +3  degrees  are  shown  resolved  in  the  MESA  snapshot  of  Figure  11. 
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The  signal  at  +3  degrees  is  given  an  initial  phase  of  90  degrees  and  the 
resulting  averaged  MESA  pattern  is  shown  in  the  same  figure.  As  a result  of 
the  initial  phase  shift,  the  two  signals  are  shifted  further  apart  by  several 
degrees  from  their  actual  separation. 

The  same  two  signals  are  again  observed  in  an  average  of  20  MESA  snap- 
shots shown  in  Figure  12  where  the  signal  at  +3  degrees  has  an  initial  phase 
of  180  degrees.  Distortion  and  peak  splitting  is  observed  to  be  the  most 
severe  in  the  averaged  pattern  ,such  that  the  two  signals  are  both  split  into 
several  peaks. 

The  conclusions  resulting  from  this  demonstration  of  MESA  antenna 
patterns  are  summarized  in  Figure  13. 
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FIGURE  4.  Three  Signals 
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FIGURE  8.  Multiple  Mixed  Signals  (20  patterns  averaged) 
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FIGURE  11.  two  Adjacent  Signals  with  Relative  Phase 


Adjacent  Signals  with  Relative  Phase 


OBSERVED  CHARACTERISTICS  OF  MESA  ANTENNA  PATTERNS 
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FIGURE  13.  CONCLUSIONS 


MAXIMUM  ENTROPY  CEPSTRAL  ANALYSIS 


TOM  LANDERS 


Applied  Seismology  Group 
Lincoln  Laboratory 

Massachusetts  Institute  of  Technology 
Cambridge,  MA  021^2 


Abstract 


Cepstral  analysis  and  maximum  entropy  spectral  analysis  have  been 
combined  and  applied  to  the  problem  of  identifying  echos  that  arrive  in  the 
waveform  of  a primary  signal.  The  method  is  referred  to  as  maximum  entropy 
cepstral  analysis.  By  using  Burg's  technique  for  designing  the  complex 
prediction  error  spectral  estimator,  only  that  part  of  the  complex  log 
spectrum  that  lies  in  the  band  where  the  combination  of  source  power  and 
instrument  bandpass  produce  good  signal  to  noise  ratios  need  be  used  to 
determine  the  cepstrum.  By  using  only  that  data  with  good  signal  to  noise 
ratios  it  is  expected  that  more  precise  spectra  than  those  obtainable  by 
classical  windowed  spectral  estimate  methods  will  be  obtained.  The  process 
order  and  consequently  the  filter  length  used  to  compute  the  maximum  entropy 
cepstrum  has  been  determined  using  Akaike's  and  Parzen's  criteria  as  aids. 

On  theoretical  data  with  up  to  50 % background  noise,  echos  at  only  a few 
digitizing  intervals  are  detected.  Used  on  a short  period  teleseismic 
recording  of  a seismic  event  where  the  echo  time  is  known  a priori,  the 
technique  finds  the  observed  surface  echo  delay  times. 

Introduction 


Echo  detection  by  cepstral  analysis  of  band  limited  signals  in  the 

presence  of  broadband  noise  works  poorly  when  the  echo  delay  time  is  shorter 

than  the  primary  signal.  This  situation  results  from  the  limitations  of 

classical  spectral  techniques  to  resolve  the  delay  harmonics  in  the  complex 

log  spectrum  since  the  harmonics  are  embedded  in  noise  whose  amplitude 

varies  logarithmically  in  the  frequency  band  of  interest.  This  paper  deals 

with  the  application  of  a spectral  estimation  technique  particularly  suited 

to  the  analysis  of  such  data,  maximum  entropy  spectral  analysis.  We  note 

that  autoregressive  spectral  analysis  and  maximum  entropy  spectral  analysis 

are  one  in  the  same. 

/ 

/ ' 
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When  the  current  value  of  a time  series  is  predictable  with  a white 
noise  error  from  a finite  linear  combination  of  its  past  values,  the  process 
is  called  all-pole  or  autoregressive[l] . The  spectrum  of  the  time  series 
is  directly  obtainable  from  the  linear  prediction  coefficients  and  the 
power  level  of  the  prediction  error  series.  From  a practical  point  of  view 
since  any  real  set  of  data  is  bounded  in  time  the  technique  used  to  estimate 
the  set  of  prediction  error  coefficients  must  be  selected  and  applied  with 
care  if  meaningful  spectral  estimates  are  to  be  obtained . 

The  first  section  of  this  paper  deals  with  the  algorithm  used  to 
estimate  the  prediction  error  coefficients  and  methods  used  to  determine 
the  number  of  coefficients  necessary  to  describe  the  model,  called  the 
order  number  of  the  process.  Following  sections  deal  with  the  application 
of  the  technique  to  echo  detection.  The  first  examples  examine  the  effect 
of  various  signal  to  noise  conditions  and  very  short  delay  times  using 
synthetic  data.  The  paper  concludes  with  the  analysis  of  an  actual  piece 
of  seismic  data  in  which  the  echo  delay  time  is  known  from  measurements  at 
the  source. 


Parameter  Est imat ion 


The  time  series  for  which  we  wish  to  have  a spectral  estimate  is 

x ,t  = -«>,...  ,-l ,0,1 ,... ,°°  (l) 

for  which  we  have  the  observed  portion  at  t=l,...,N.  The  autoregressive 
model  of  x is 


AlXt-l 


A2Xt-2  + 


+ A xt  + 
P t-p 


(2) 


where  e^,  called  the  innovation  or  prediction  error,  is,  in  our  case,  white 
noise.  The  spectrum  is  given  by 


S(f)  = 2 At  E / 1 1-  l A.exp(i2irf  jAt)  | 2 
P J=1  J 


(3) 


where  Ep  is  the  total  residual  square  prediction  error  for  prediction 
filter  length  p.  Thus  to  calculate  the  power  spectrum  of  x we  must  determine 
the  order  number  p and  the  p values  of  the  prediction  filter. 

Several  approaches  are  available  for  estimating  the  prediction  coeffi- 
cients. The  one  we've  chosen  is  due  to  Burg[3].  In  this  method  the  measure 
of  the  error  in  the  construction  of  the  coefficients  is  taken  as  the  sum  of 
the  total  squared  prediction  error  resulting  from  predicting  forward  in 
time  with  the  same  quantity  for  predicting  backward  in  time,  a physically 
reasonable  measure  under  the  assumption  that  the  data  is  stationary.  Burg 
showed  that  if  the  prediction  coefficients  for  order  p were  produced  from  a 
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linear  combination  of  the  prediction  coefficients  of  order  p-1  and  the 
reverse  time  prediction  coefficients  of  order  p-1  such  that  the  above 
measure  of  error  is  minimized  by  the  choice  of  the  scale  factor  of  the 
combination,  then  the  autocorrelation  function  will  be  necessarily  positive 
definite.  Use  of  the  Burg  algorithm  has  the  side  effect  that  the  location 
of  spectral  peaks  are  somewhat  dependent  on  the  initial  phase[2,4].  In 
practice,  the  degree  of  this  error  can  be  estimated  by  comparing  spectral 
estimates  of  slightly  shifted  versions  of  the  data  set  under  consideration. 
While  other  methods  for  determining  the  prediction  coefficients  that  have 
comparable  resolution  to  the  Burg  method  do  not  seem  to  have  this  problem 
they  do  not  necessarily  produce  positive  definite  autocorrelation  functions 
and  are  computationally  less  efficient.  On  this  basis  we  used  the  Burg 
technique  for  parameter  estimation. 

It  remains  to  determine  the  value  of  p,  the  order  number.  The  following 
methods  have  been  proposed.  Akaike[5,6]  suggested  that  the  minimum  of  the 
average  error  due  to  estimating  the  autoregressive  coefficients  and  the 
innovation  for  one  step  prediction  gives  p.  The  criterion,  called  the 
final  prediction  error,  FPE,  to  be  minimized  is 

FPE(M)  = P (N+(M+l))/(N-(M+l))  (1+) 

where  P^  is  the  residual  squared  error  for  an  tfth  length  filter  and  N is 
the  length  of  X.  The  M for  which  FPE  is  a minimum  is  taken  as  p.  Alterna- 
tively Akaike[7,8]  suggested  minimizing  the  log  likelihood  of  the  innovation 
variance  as  a function  of  filter  length  to  find  p.  This  criterion,  called 
the  information  theoretic  criterion,  AIC,  is  estimated  by 

AIC(M)  = Hn  (P  ) + 2M/N.  (5) 

Again  the  M for  which  the  AIC  is  minimized  is  taken  as  p.  A third  method 
considered  here  was  proposed  by  Parzen[9,10]  and  is  known  as  the  autoregres- 
sive transfer  function  criterion,  CAT.  The  order  p is  given  where  the 
estimate  of  the  difference  of  the  mean  square  error  between  the  true  fil+er, 
which  exactly  gives  the  innovation,  and  the  estimated  filter  is  a minimum. 
Parzen  showed  that  this  difference  can  be  calculated,  without  explicitly 
knowing  the  exact  infinite  filter,  by 

cat(m)  = i Pj-1  -p/  (6> 


where 


Pj  = N/(N-J)P  . 

In  the  examples  that  follow  the  Burg  method  is  used  to  determine  the  A^'s. 
For  plotting  purposes,  noting  that  log  FPE  asymptotically  approaches  AIC, 
we  define 
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F(M)  = Log10(Log10(FPE(M))-Log10(FPE(p))+l)) 

A(M)  = Log10(AIC(M)-AIC(p)+l)  (7) 

C(M)  = Log10(CAT(M)-CAT(p)+l) 


such  that  the  value  of  each  criterion  at  the  predicted  order  number,  p,  is 

0. 


Landers  and  Lacoss[l5]  examined  the  method  for  complex  sinusoids  with 
various  signal  to  noise  ratios.  Their  results  indicated  that  where  little 
noise  is  present,  AIC,  FPE  and  CAT  give  an  order  number  that  produces  an 
acceptable  spectra.  When  added  white  noise  was  large,  the  criteria  gave 
inconsistent  values  and,  in  these  cases  larger  filter  lengths  produced 
better  spectra.  We  note  that  it  is  often  the  location  and  size  of  narrow 
band  components  which  are  of  scientific  interest  but  that  the  various  order 
criteria  select  filter  orders  based  upon  the  entire  spectrum  including 
added  white  noise.  For  example  although  large  filter  lengths  produced  the 
best  fit  spectra,  the  complex  harmonic  exp  ( io0t)  can  be  exactly  predicted 
by  the  one  point  prediction  error  wavelet  (l,  -exp  ( /-i  ci>0)). 

For  a single  echo  in  a broadband  primary,  the  complex  log  spectrum 
looks  like 

F(f)  = log  (1  + ael2lTft)  (8) 

where  t is  the  echo  delay  time  and  a the  echo  size.  Four  cycles  of  this 
function  for  a equal  . 5 and  a delay  time  of  1 with  10%  white  noise  added 
are  plotted  at  the  top  of  Figure  1.  Note  that  frequency  and  time  have  been 
interchanged.  While  the  function  is  all-pole,  it  has  an  infinite  number  of 
poles,  that  is  to  say  an  infinite  order  number.  The  error  functions,  shown 
below  the  time  series,  indicate  that  the  filter  length  should  be  greater 
than  twenty-three  though  no  clear  minimum  value  is  discernable.  The  maximum 
entropy  spectrum  for  a filter  length  of  twenty-three  is  shown  at  the  bottom, 
the  dashed  line  indicating  the  locus  line  of  the  amplitudes  of  the  exact 
spectral  peaks.  The  computed  amplitudes  vary  by  a few  dB  from  the  line  and 
up  to  the  eighth  order  give  the  correct  frequency.  In  general  we  use  the 
FPE,  AIC,  and  CAT  criteria  as  guides  to  picking  the  order  number  reserving 
the  possibility  that  larger  or  smaller  values  of  p can  provide  useful 
information  on  the  nature  of  the  physical  system  being  modeled.  We  now 
procede  to  use  the  Burg  algorithm,  Akaike's  and  Parzen's  order  number 
criteria  and  maximum  entropy  spectral  analysis  to  compute  cepstra  of  time 
series  containing  echoes. 
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Cepstral  Analysis 


Our  examples  deal  with  echo  detection  in  seismic  data.  The  nature  of 
the  first  few  seconds  of  a short  period  seismogram  for  a shallow  event  is 
determined  by  the  shape  of  the  wave  that  travels  directly  from  the  source 
region  to  the  seismometer  and  the  underside  reflection  off  the  earth's 
surface  that  results  in  the  addition  of  a delayed  and  scaled  version  cf  the 
direct  wave.  The  reflection  is  referred  to  as  the  depth  phase  since  its 
time  delay  coupled  with  the  known  elastic  velocity  structure  in  the  upper 
layers  c'f  the  earth  allows  one  to  calculate  the  depth  of  the  event.  Since 
the  amplitude  of  depth  phases  are  often  as  large  as  the  amplitude  of  the 
direct  waves  and,  since  short  period  seismic  systems  are  usually  quite 
narrow  band,  depth  phases  that  arrive  while  energy  from  the  primary  arrival 
is  still  coming  in  cannot  be  clearly  visually  detected.  Assuming  that  the 
secondary  arrivals  differ  from  the  primary  wave  by  only  a scale  factor  and 
some  dispersion  due  to  anelastic  attenuation,  detection  may  in  theory  be 
accomplished  by  cepstral  analysis[l2] . Unfortunately,  when  our  echo  is 
only  slightly  delayed  from  the  primary,  the  log  spectrum  over  the  band  of 
good  signal  to  noise  ratio  contains  only  a few  cycles  of  the  primary  harmonic 
that  peaks  the  cepstrum  at  the  echo  delay  ime.  Under  such  conditions 
normal  spectral  analysis  of  the  log  spectrum  «rill  not  produce  well  defined 
cepstra.  Thus  we  employ  maximum  entropy  spectral  analysis  to  obtain  the 
necessary  resolution. 

In  Figure  2,  a synthetic  seismogram  with  echoes  at  1 and  3 seconds  in 
5$  white  noise  is  shown  at  the  top.  The  complex  log  spectra  and  the  power 
spectra  are  shown  below  the  seismogram.  The  nonlinear  effects  of  noise  can 
be  seen  in  the  regions  of  the  spectra  where  the  signal  power  is  low.  To 
minimize  this  effect,  only  the  band  from  .5  to  2 hertz  was  used  to  compute 
the  complex  Burg  filter  coefficients.  The  error  functions  corresponding  to 
these  coefficients  are  shown  at  the  bottom  of  the  figure.  The  same  functions 
were  computed  for  signal  to  noise  ratios  in  the  time  signal  of  .001,  .05', 

.5  and  1,  each  time  limiting  the  log  spectral  band  to  the  region  of  good 
signal  to  noise.  The  maximum  entropy  cepstra  for  these  cases  are  shown  in 
Figure  3,  for  decreasing  signal  to  noise  ratios  from  top  to  bottom.  While 
the  lowest  noise  case  gives  the  exact  result,  signal  to  noise  ratios  up  to 
•5  give  satisfactory  results.  Figure  k shows  the  analysis  for  an  echo 
delayed  by  only  four  digitizing  intervals  in  5%  white  noise.  Though  the 
delay  time  is  not  well  resolved  the  peak  occurs  at  the  correct  time. 

The  following  illustrates  the  method  as  applied  to  a real  seismogram 
recorded  in  Norway  for  a source  in  Nevada.  It  is  known  from  instruments  at 
the  source  site  that  the  depth  phase  will  be  delayed  from  the  initial 
arrival  by  .98  seconds.  Referring  to  Figure  5»  part  (a)  shows  the  recorded 
seismogram  and  part  (b)  the  tapered  version  that  isolates  the  first  arrival 
and  the  depth  phase.  Taking  the  discrete  fast  Fourier  transform  of  part 
(b)  and  then  the  analytic  complex  logarithm  results  in  the  power  spectrum, 
real  log  spectrum  and  imaginary  log  spectrum  shown  in  part  (c).  The  effect 
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of  the  short  period  seismometer  and  anelastic  attenuation  has  been  compensated 
for  in  the  log  spectrum.  Selecting  the  band  over  which  there  is  sufficient 
power  to  insure  that  the  behavior  of  the  imaginary  part  of  the  log  spectrum 
(the  phase  of  the  spectrum)  is  controlled  by  signal  rather  than  noise 
conditions  and  removing  linear  trends  from  the  real  (a  source  correction) 
and  imaginary  (a  simple  time  delay  of  the  whole  trace)  produces  the  reliable 
part  of  the  complex  log  spectrum  shown  in  part  (e).  The  power  spectrum  of 
this  complex  frequency  function  should  have  peaks  at  the  delay  and  multiples 
of  the  delay  time  of  the  depth,  phase.  Since  the  delay  time  is  known  to  be 
near  1 sec  the  log  spectrum  should  have  a primary  periodicity  of  about  1 
hertz.  The  total  band  is  only  1.5  hertz  long  so  that  we  wish  to  make  a 
spectral  estimate  of  series  that  contains  only  about  1.5  cycles  of  the 
primary  component.  Applying  the  Burg  algorithm  to  this  complex  frequency 
function  and  calculating  the  Akaike  FPE  and  AIC  functions  from  the  error 
that  results  when  the  Burg  coefficients  are  applied,  we  obtain  the  curves 
shown  in  part  (f).  The  model  parameters,  namely  the  prediction  coefficients, 
for  the  order  corresponding  to  the  minimum  in  the  order  number  curves  are 
shown  in  part  (g).  The  exact  spectrum  of  the  model  which  is  the  estimated 
spectrum  of  the  log  spectrum  or  cepstrum  is  shown  in  part  (h).  Detected  in 
the  cepstrum  is  the  echo  delay  time  at  .95  and  its  first  harmonic  at  1.9 
(2*. 95)  seconds.  The  predicted  time  delay  between  the  first  peak  of  the 
primary  arrival  and  the  first  peak  of  the  depth  phase,  which  should  be 
reversed  in  sign,  is  shown  in  part  (i).  The  difference  between  the  known 
and  predicted  delay  time  is  less  than  1 digitizing  unit,  the  original  data 
being  sampled  at  20  hertz.  The  same  technique  applied [13,1^*]  to  other 
sources  and  multiple  recordings  of  single  sources  shows  that  for  delays  of 
more  than  approximately  . 5 seconds  that  the  depth  phase  delay  time  falls 
within  approximately  .1  seconds  of  the  correct  time.  In  many  cases  the 
seismograms  did  not  show  any  obvious  change  of  character  at  the  delay  time. 

Discussion 


The  success  of  maximum  entropy  or  autoregressive  spectral  analysis  in 
large  part  depends  on  how  well  one  is  able  to  model  a process.  In  our 
examples,  where  the  physical  nature  of  the  phenomena  dictated  that  all-pole 
or  prediction  error  models  were  appropriate,  we  were  able  to  obtain  useful 
spectra.  By  using  the  Burg  algorithm,  model  coefficients  were  produced 
without  making  any  unrealistic  assumptions  about  the  data  outside  the 
actual  observation  interval.  In  each  case,  spectia  consistent  with  known 
data  properties  or  internally  consistent  among  independent  data  sets  confirmed 
that  the  technique  was  producing  reliable  results.  While  order  number 
criteria  are  not  perfect  and  proolems  exist  in  determining  model  parameters 
the  technique  can  provide  useful  results  in  cases  where  classical  estimation 
techniques  are  not  applicable. 
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FIGURE  5 
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Maximum  entropy  cepstral  analysis  for  an  event  with  an 
echo  known  to  arrive  at  about  .98  seconds. 
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Abstract 


A new  adaptive  filter  to  reject  clutter  is  derived  using  autoregressive 
spectral  analysis  techniques.  The  adaptive  filter  performs  open  loop  proces- 
sing, resulting  in  a shorter  transient  response,  and  is  therefore  suitable 
for  radar  waveforms  containing  only  a small  number  of  samples.  A number  of 
examples  including  application  to  Ballistic  Missile  Defense  are  presented  to 
demonstrate  the  performance  capabilities  of  the  new  adaptive  filter. 

1.  Introduction 


This  paper  presents  a new  adaptive  filter  (AF)  for  clutter  suppression, 
utilizing  an  autoregressive  (AR)  spectral  analysis  technique.  Such  a filter 
can  be  useful  in  tactical  situations  where  the  clutter  environment  is  chang- 
ing, such  as  in  air  defense  (chaff),  airborne  moving  target  indication  (MTI), 
and  tank  break-up  clutter  in  ballistic  missile  defense.  The  AF  of  this  paper 
estimates  the  changing  clutter  spectrum  parametrically  and  employs  this  para- 
metric estimate  to  adaptively  notch  out  the  clutter  returns.  Conceptually 
the  AF  consists  of  a variable  whitening  filter  (WF) , followed  by  a variable, 
modified  matched  filter  (MMF) . The  manner  in  which  these  filters  are  varied 
depends  on  the  parametric  spectral  estimate  of  clutter  determined  from  the 
data.  This  is  shown  in  Figure  1.  Note  that  this  structure  is  different  from 
the  "estimator-subtractor"  structure  of  sidelobe  cancellers  [1]  or  the 
least-mean-square  (LMS)  adaptive  filter  [2].  The  present  adaptive  filter  is 
an  attractive  alternative  to  the  feedback  structures  of  Applebaum  [1]  and 
Widrow  [2].  The  absence  of  feedback  loops  assures  a shorter  transient  res- 
ponse for  the  filter,  and  the  use  of  a Kalman  filter  for  a spectral  analysis 
provides  a very  rapid  estimate  of  the  clutter  spectrum,  thereby  providing 
quick  covergence  of  the  WF  and  MMF  to  "match"  the  clutter  environment.  This 
latter  feature  is  very  attractive  in  rapidly  changing  clutter  environments 
like  tank  break-up  during  reentry  of  ballistic  objects. 
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The  operation  of  the  adaptive  filter  may  be  roughly  explained  as  follows. 
The  spectral  analyzer  estimates  the  clutter  spectrum  parametrically.  Then 
the  parameters  are  used  to  automatically  design  the  whitening  filter,  or 
equivalently  place  notches  at  the  clutter  frequencies.  Because  the  target 
signal  of  interest  is  modified  in  passing  through  the  WF,  the  MMF  is  matched 
to  the  new  signal.  Thus,  as  the  clutter  spectrum  is  changing,  the  AF  tries 
to  maximize  the  signal  to  interference  ratio  adaptively.  The  technical 
approach  in  obtaining  the  AF  structure  is  presented  in  Section  2. 

This  new  adaptive  filter  has  been  successfully  tested  with  synthetic 
data  as  well  as  with  clutter  from  tank  break-up  in  an  actual  ballistic  mis- 
sile test,  as  recorded  at  the  Kwajalein  Missile  Range  (KMR) . Clutter  sup- 
pression on  the  order  of  20-30  dB  better  than  the  corresponding  matched 
filter  has  been  achieved  with  these  data.  These  results  are  presented  in 
Section  3 along  with  an  analysis  of  the  adaptive  filter  performance  with  a 
simulated  clutter  source  (a  Markov  process) . Some  key  points  in  the  adaptive 
filter  are  discussed  in  Section  4.  The  paper  concludes  with  a brief  summary. 

2.  Technical  Approach 

We  consider  the  detection  of  a target  signal  S(t)  in  clutter  C(t)  and 
white  noise  N(t)  as  a preliminary  step  in  deriving  the  adaptive  filter  struc- 
ture. This  can  be  posed  as  a hypothesis  testing  problem.  Under  hypothesis 
Hi  the  target  is  present  and  the  return  signal  R(t)  can  be  written  as 

Hj : R(t)  = S(t)  + C(t)  + N(t) 

Under  hypothesis  H the  target  is  not  present  and  the  return  signal  can  be 
written  as 

H : R(t)  = C(t)  + N(t) 

o 

The  objective  is  to  be  able  to  detect  the  target  in  a strong  clutter  environ- 
ment, and  minimize  the  false  alarms  due  to  clutter. 

The  first  assumption  in  solving  the  problem  is  that  clutter  is  a corre- 
lated noise  process.  In  accordance  with  this  assumption,  let  the  autocorrela- 
tion function  of  C(t)  be 

<PC  (ti,t2)  = E [C(ti)  C(t2)  ] 

where  E is  the  expected  value  operator.  Let  the  thermal  noise  be  a stationary 
process  with  a power  spectral  density  of  N0/2.  In  this  case,  the  optimum 
solution  [3]  can  be  obtained  as  shown  in  Figure  2. 

The  whitening  filter  (WF)  decorrelates  the  interference  C(t)  + N(t)  to  a 
thermal  noise  like  signal  Ni(t).  In  the  process  of  doing  so,  the  signal  S(t), 
if  present,  is  modified  to  Si(t).  The  output  of  the  WF  is  given  by: 


Hi: 

Ri(t) 

= S i (t)  + Ni(t) 

H : 
o 

R:(t) 

= Ni(t) 
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This  corresponds  to  the  classical  problem  of  detecting  the  known  signal 
S i ( t ) in  the  white  noise  Ni(t).  The  solution  consists  of  passing  Ri(t) 
through  a filter  matched  to  the  signal  Si(t).  This  filter  is  called  the 
modified  matched  filter  (MMF).  In  other  words  the  problem  of  detecting  a 
signal  in  colored  noise  has  been  converted  to  a standard  form.  The  mathe- 
matical details  may  be  found  in  [3]. 

The  impulse  response  of  the  whitening  filter  is  given  by  the  solution  of 
an  integral  equation: 


<5(z-v)  T^  £ z,v  < (1) 


I <j>  (x,z) 
T. 

l 


/ h(u,v)  h(u,x)  du  dx  = 


T. 

i 


where  (^(x.z)  = <pc(x,z)  + N /2  6(x-z) 

is  the  autocorrelation  of  the  interference.  Note  that  in  the  integral  equa- 
tion, the  quantity  inside  the  parenthesis  is  the  inverse  kernel  of  the  corre- 
lation function  of  the  interference.  Naturally,  the  whitening  filter  impulse 
response  h(t,x)  is  the  solution  to  the  integral  equation  with  this  inverse 
kernel.  The  above  integral  equation  can  be  derived  by  assuming  that  the 
clutter  process  is  nonstationary.  This  leads  to  a time  varying  whitening 
filter.  In  practive,  clutter  is  not  stationary,  and  the  interval  of  observa- 
tion is  only  finite.  Therefore,  the  true  optimum  solution  may  be  expected  to 
involve  a time  varying  whitening  filter  and  MMF.  However,  if  a short-time 
clutter  spectrum  is  defined,  it  can  be  used  to  obtain  the  whitening  filter. 

As  the  clutter  spectrum  varies  with  time,  the  impulse  responses  of  the  WF  and 
MMF  will  vary  insuring  real-time  adaptation  to  the  changing  environment. 

In  general  the  clutter  correlation  function  <j>  (ti,t2)  is  not  known  and 
must  be  estimated.  Or,  equivalently,  the  short  term  spectrum  $ (f)  must  be 
estimated,  as  in  this  study.  Over  a short  period  of  time  the  c?utter  is 
modeled  as  a stationary  autoregressive  (AR)  process;  the  parameters  of  the  AR 
process  are  estimated;  and,  in  turn,  these  estimates  are  used  to  automatically 
design  the  whitening  filter  and  MMF.  Because  the  estimates  depend  on  the 
data,  they  change  with  time  resulting  in  a time  varying  receiver  structure. 

Specifically,  the  clutter  samples  are  modelled  as  an  M-parameter  autore- 
gressive (AR)  process.  That  is 

M 

C(k)  = l 0L±  C(k-i)  + e(k)  (2) 

i=l 

where  the  a 's  are  some  constants,  e(k)  is  a white  noise  process  with  unit 
variance  ana  C(k-i)  is  the  clutter  sample  lagging  by  i samples  from  the 
present  sample.  It  is  simple  to  show  in  this  case  that  the  power  spectrum  of 
the  clutter  process  is  given  by 
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2 


Vf) 


1 

M 

1 - l a.  exp (-j 2TrfT) 

i=l 


(3) 


where  T is  the  sampling  interval. 

A 

Let  be  the  estimate  of  ou  based  on  the  observation  of  the  clutter  samples. 
Then,  an  estimate  of  the  clutter  spectrum  is  given  by 


$ (f) 
c 


1 


2 


1 

M ^ 

I a.  exp(-j2ufT) 
i=l  1 


(4) 


In  this  case,  the  whitening  filter  (assuming  a large  clutter  to  noise  ratio) 
is  given  by 


H (f)  = 1 - l a exp  (-j  2TrfT) 

i=l 


(5) 


Note  that  the  above  whitening  filter  can  be  realized  as  a tapped  delay  line 
with  tap  weights  given^by  1 and  -a.,  and  has  a simple  structure  which  can  be 
easily  changed  as  the  a's  vary.  This  was  one  of  the  motivating  factors  in 
modeling  clutter  as  an  AR  process.  Other  factors  include  the  noted  success 
with  which  AR  models  have  been  applied  to  speech  signals,  which  are  also 
non-stationary  processes. 

Many  different  algorithms  exist  to  estimate  the  parameters  a. , and 
invariably  involved  time  averages  of  correlation  functions,  and  matrix 
inversions  [4,  5,  6].  In  this  paper,  a Kalman  filtering  technique  is 
employed  to  estimate  the  WF  papameters.  The  advantages  are  that  no  matrix 
inversion  is  involved  in  this  case,  and  the  estimates  are  obtained  sequen- 
tially in  real  time. 

To  utilize  Kalman  filtering  techniques,  the  spectral  parameter  estima- 
tion problem  must  be  first  converted  to  the  state  variable  form.  Accordingly 
the  AR  process  parameters  ai , 0t2 » ...  are  defined  as  state  variables  and 
modeled  as  constants: 

a (k+1)  = a (k)  M x 1 vector  (6) 
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That  is,  the  state  transition  matrix  is  an  identity  matrix.  The  process 
noise  term,  usually  present  in  Kalman  filtering  is  absent  in  this  case. 

The  observation  is  a scalar  variable  C(k+1)  given  by  a time  varying  linear 
combination  of  the  state  variables  a (k)  plus  the  white  noise  term  e(k+l). 
This  can  be  written  compactly  in  a vector  notion  as 

C (k+1)  = H (k+1)  a (k+1)  + e (k+1)  (7) 

where  H (k+1)  = [C (k)  C(k-l)  ...  C(k-Mfl)]  (8) 

Let  the  variance  of  the  white  noise  component  be  denoted  by  G2.  By  applying 
standard  discrete  Kalman  filtering  algorithm  [7]  to  the  message  and  observa- 
tion models  given  by  Eqs.  (6)  and  (7),  the  following  equations  are  obtained: 

a (k+1)  = a (k)  + K (k+1)  [C(k+1)  - H (k+1)  a (k)  ] (9) 

(k+1)  = [I-K  (k+1)  H (k+1)]  Va  (k) 

- covariance  update  equation  (10) 

K(k+1)  = r 

[G2  + H (k+1)  Va  (k)  H (k+1)] 


- Gain  Vector  Computation  (11) 

In  the  above  equations,  the  superscript  t indicates  conjugate  transponse. 

Note  that  H (k+1)  a (k)  is  the  one  step  prediction  of  the^present  clutter 
observation  sample  C (k+1) . The  term  [C  (k+1)  - H (k+1)  a (k) ] can  be 
denoted  by  V (k+1)  and  is  known  as  residual  or  innovation  process.  It  is 
basically  a sequence  of  "new  information"  and  is  a white  process  [8].  Note 
that  the  algorithm  does  not  need  to  invert  any  matrix.  This  is  because  the 
observation  C (k+l)  is  a scalar  quantity. 

The  effect  of  using  M parameters  for  the  AR  process  is  to  interpolate 
the  clutter  spectrum  with  the  frequency  response  of  M poles  in  the  complex 
frequency  plane.  The  number  chosen  for  M is  generally  a compromise  between 
good  spectral  estimation  and  desirability  of  short  transient  response.  It 
is  also  worthwhile  noting  that  the  AR  process  model  can  represent  discrete 
target  interference  as  well.  Therefore,  the  method  can  also  be  used  for 
reducing  interference  from  other  targets  in  a multiple  target  environment. 

Certain  drawbacks  exist  in  the  Kalman  filter  algorithm  for  clutter 
spectral  estimation.  If  the  clutter  samples  are  present  only  up  to  a certain 
time,  (say  N samples),  beyond  which  only  noise  is  present,  the  estimates  do 
not  drop  off  to  a small  value,  and  are  preserved  with  a memory  typical  of 
recursive  techniques.  In  order  to  reduce  this  memory,  a number  of  solutions 
have  been  proposed.  One  of  the  more  attractive  schemes  is  to  have  an 


263 


exponentially  fading  memory  for  the  Kalman  spectral  estimator.  This  is 
implemented  by  artificially  increasing  the  convariance  V (k)  at  each  stage 
by  a certain  percentage  [9].  This  method  has  been  fairly  effective  in  the 
present  study.  Another  method,  called  the  limited  memory  filter  [10],  is 
a little  more  complicated,  and  periodically  erases  the  memory  of  the  estima- 
tor completely.  In  many  cases,  it  is  not  clear  how  this  periodicity  can  be 
chosen  a priori.  On  the  other  hand,  the  exponentially  fading  memory  length 
can  be  chosen  such  that  the  covariance  is  not  changed  by  more  than  20%  each 
time.  This  increase  reflects  -the  uncertainty  in  the  estimates,  and  is  a 
divergence  prevention  technique.  In  such  a case,  the  results  are  only  near 
optimum. 


3.  Results 


In  this  section,  we  present  a number  of  examples  illustrating  the 
studies  conducted  with  the  proposed  adaptive  filter.  In  the  first  part  of 
this  section  we  derive  the  structure  of  the  adaptive  filter  for  the  case  of 
an  N-pulse  weighted  burst  waveform.  This  structure  is  used  in  the  examples 
that  follow.  In  the  second  part  of  this  section  we  compare  the  signal-to- 
clutter  (S/C)  and  signal-to-interf erence  (S/I)  ratios  theoretically  obtain- 
able by  an  optimum  interference  rejection  filter  (the  interference  statistics 
exactly  known),  with  that  achieved  by  the  adaptive  fi’ter  for  the  case  of 
a Markov  random  process  as  an  interference  source. 

In  the  third  part  of  the  section  we  demonstrate  the  ability  of  the 
adaptive  filter  to  place  nulls  in  its  frequency  response  corresponding  to 
the  interference  frequencies.  The  final  part  of  this  section  presents  the 
application  of  the  adaptive  filter  to  suppress  the  tank  break-up  clutter 
of  a re-entry  booster  in  ballistic  missile  defense  (BMD)  application. 

3.1  Adaptive  Filter  Structure  For  An  N-Pulse  Burst 

Let  the  transmit  weights  of  an  N-pulse  burst  waveform  be  {ai,  az,  ... 
a^}  which  may  be  complex  in  general.  Then  the  sampled  return  from  a desired 
target  of  unit  strength,  zero  range,  and  doppler  velocity  rad/sec  will  be 

r (k)  = a exp(jU).kT)  k * 1,  2,  ...  N (12) 

K a 

The  matched  filter  to  process  this  return  will  be  a tapped  delay  line  with 
weights  {b^}  where 

bk  = k+1  exP(-jwdkT)  k = 1,  2 ...  N (13) 

Without  loss  of  generality  we  can  set  0^=0  in  all  our  discussions,  in  which 
case  the  exponential  in  Eq.  (13)  can  be  set  to  unity.  Let  the  spectral 
parameter  estimates  corresponding  to  the  interference  be  aj,  a2,  . . . ol,  at 
any  given  time^sample.  Then  the  impulse  response  of  the  whitening  filter  is 
given  by  {1,  -cti,  -&2»  •••  o^}  and  can  be  implemented  as  a tapped  delay 
line.  The  return  signal  from  the  desired  target  is  passed  through  the 
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whitening  filter  along  with  the  interference.  In  this  process,  the  target 
signal  is  modified  and  corresponds  to  the  sequence  {c^l,  k=l,  2,  M+N-l 

given  by  convolving  (a^l  with  { 1 , -oil , ...  -u^}.  In  order  to  maximize  the 
signal-to-noise  ratio  at  the  output  of  the  processor,  the  filter  following 
the  whitening  filter  must  be  matched  to  this  modified  signal  return  {c^}. 
Thus  the  modified  matched  filter  (M>1F)  has  an  impulse  response  (d^l,  and  is 
given  by 


dk  = C M+N-k’  k=1’  2> 


. . M+N-l 


(14) 


The  MMF  can  be  implemented  as  a tapped  delay  line  with  weights  d^.  The  out- 
put of  the  MMF  can  now  be  processed  further  for  detection  processing. 


The  receiver  structure  for  processing  the  N-pulse  burst  data  is  shown 
in  Figure  3.  This  structure  is  basically  used  in  our  subsequent  examples 
and  discussions.  Note  that  in  practice  we  can  combine  the  WF  and  MMF  to 
implement  a single  tapped  delay  line  filter  with  weights  , k=l,  2,  ..., 
N+2M-1  where  {g^l  is  the  result  of  convolving  (l,  a*,  -a?.,  ...  -a  } with  the 
sequence  {d^l-  M 

3.2  S/C  and  S/I  Performance  of  the  Adaptive  Filter 

Consider  that  the  clutter  is  a stationary,  first-order  Gauss  Markow 
process  generated  by 


Jk+1 


= 0 C,  + w, 


where  c^  is  the  clutter  sample  interference,  |0|  <1  and  w^  is  a white 
Gaussian  sequence.  Assuming  zero  mean  for  the  processes  involved,  the 
covariance  matrix  of  the  interference  for  N samples  is  given  by  the  N x N 
Toplitz  matrix 


1 

0 


0 
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0 
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N-l 

N-2 


0 


N-2 
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Let  the  N-sample  signal  return  be  denoted  by  the  vector  S^.  Let  the  return 
vector  R denote  the  sum  of  signal  return,  clutter,  and  thermal  noise 


where 

and 


R = S + C + V 

C is  an  N vector  of  clutter  samples 
V is  an  N vector  of  thermal  noise  samples 


Let  the  vector  W 

—opt 


denote  the  weights  of  the  optimum  tapped  delay  line 
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filter.  Then  the  S/C  and  S/I  ratios  are  given  by 


S/C 


Igppt  ^ 


W d>  W 
'—opt  c —opt 


and 


S/I 


w1  <}>T  w 

'—opt  I — opt 


where  4>  refers  to  the  interference  covariance  matrix,  the  sum  of  the  clutter 
and  thermal  noise  covariance  matrices  <J)  and  <J)  . For  the  case  of  the  adap- 
tive filter,  we  can  obtain  S/C  and  S/I  with  W replaced  by  the  adaptive 
filter  weights  W^.  —opt 

In  order  to  evaluate  S/C  and  S/1  ratios  for  the  adaptive  filter  for 
various  values  of  0 a monte  carlo  simulation  was  performed.  These  ratios 
were  computed  at  various  stages  of  adaptation.  To  compare  the  performance 
of  the  adaptive  filter  with  the  optimum  filter  on  the  same  basis,  the  noise 
gain  of  each  filter  was  normalized  to  unity.  The  results  of  the  simulation 
study  are  presented  in  Figure  4. 

3.3  Adaptive  Filter  Frequency  Response 

The  adaptive  filter  of  this  paper  is  derived  essentially  from  a fre- 
quency domain  viewpoint.  The  filter  automatically  adjusts  its  weight  to 
place  null  in  its  frequency  response  corresponding  to  the- significant 
frequency  components  of  the  clutter  spectrum.  We  demonstrate  this  ability 
of  the  adaptive  filter  in  this  section. 


Clutter  data  was  simulated  by  means  of  two  closely  spaced  discrete 
frequency  scatterers.  The  clutter-to-noise  ratio  was  37  dB.  Figure  5 
presents  the  adaptive  filter  frequency  response  at  the  end  of  15  adaptation 
samples.  The  clutter  spectrum  is  also  shown  in  the  figure  for  comparison. 
Note  that  the  null  in  the  adaptive  filter  is  down  -100  dB  from  the  zero 
frequency  gain.  The  number  of  spectral  parameters  used  in  this  case  was  4. 

3.4  BMP  Applications 

In  a BMD  environment,  the  clutter  due  to  tank  break-up  (TBU)  during 
re-entry  into  the  atmosphere  reduces  the  visibility  of  a threat  (re-entry 
vehicle) , and  also  increases  the  data  processor  load  because  of  a large 
number  of  false  alarms.  In  this  case,  the  transmitted  waveform  is  a coherent 
burst  of  16  pulses  with  50  dB  Chebyshev  weighting.  The  adaptive  filter 
structure  derived  in  the  first  part  of  this  section  was  applied  to  this 
problem  using  data  from  an  actual  ballistic  missile  test  as  recorded  by  a 
radar  at  KMR.  The  data  consisted  of  only  the  tank  break-up  clutter.  The 
results  are  for  the  outputs  of  the  matched  filter  (MF  as  well  as  the  adaptive 
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filter  for  purpose  of  comparison.  Figure  6 shows  the  MF  and  AF  outputs 
for  TBU  data  at  a low  altitude.  This  is  a relatively  low  strength  clutter 
region.  From  the  figure,  a clutter  suppression  on  the  order  of  10  dB  better 
than  the  matched  filter  is  easily  noted.  Figure  7 shows  the  MF  and  AF  out- 
puts corresponding  to  TBU  data  from  the  heavy  clutter  region  (high  altitude). 
The  spikes  are  ambiguities  of  a large  tank  fragment  at  the  expected  target 
doppler  velocity,  and  is  processed  as  a tank  return,  with  no  suppression. 

The  additional  clutter  which  is  seen  between  any  two  spikes,  is  reduced  by 
the  adaptive  filter  significantly.  Around  the  region  of  expected  target 
location,  the  suppression  is  on  the  order  of  20-30  dB  more  than  the  matched 
filter. 

Many  sets  of  these  TBU  data  have  been  successfully  processed  with  the 
adaptive  filter  demonstrating  its  clutter  suppression  capability  in  real 
world  applications. 


4.  Discussion 


The  adaptive  filter  presented  in  this  paper  is  an  open  loop  technique 
unlike  the  sidelobe  canceller  [1]  or  Widrow's  adaptive  filter  [2].  This 
generally  reduces  the  transient  response  duration.  The  covergence  of  the 
adaptive  filter  to  the  optimum  filter  depends  on  the  convergence  of  the 
spectral  estimator.  In  all  our  studies,  this  convergence  has  not  been  a 
problem.  In  particular  in  BMD  applications  where  only  16  samples  are 
available  for  processing,  the  conventional  feedback  techniques  appear  to 
be  inadequate,  while  the  new  adaptive  filter  has  performed  satisfactorily. 
This  seems  to  be  primarily  due  to  the  rapid  identification  of  clutter 
spectral  parameters  by  the  Kalman  filter  technique. 

Although  the  results  are  generally  satisfactory,  some  difficulties  are 
encountered  in  using  the  new  adaptive  filter.  First,  a general  theoretical 
analysis  seems  to  be  very  difficult  because  of  the  time  varying  nature  Of 
the  filter.  This  is  partially  solved  in  part  2 of  the  previous  section 
where  the  Markov  interference  effects  were  analyzed  experimentally.  Another 
tool  used  often  is  the  frequency  response  for  particular  experiments.  This 
was  done  in  part  3 of  the  previous  section.  Yet  further  analysis  is  neces- 
sary in  relating  the  Kalman  filter  convergence,  and  the  state  error  variances 
to  the  convergence  of  the  adaptive  filter.  Also  important  is  the  study  of 
the  effects  of  divergence  prevention  techniques  used  in  the  Kalman  filter. 

The  key  difficulty  in  developing  the  above  analysis  is  that  the  variances  in 
the  Kalman  filter  are  not  pre-computable. 

Secondly,  note  that  the  adaptive  filter  has  an  impulse  response  longer 
than  the  signal  duration.  This  difficulty  can  be  overcome  by  properly 
truncating  the  adaptive  filter  impulse  response  to  signal  duration.  We  can 
explain  the  increased  duration  of  the  impulse  response  as  follows.  Note  that 
the  theoretical  impulse  response  of  the  whitening  filter  is  given  by  Eq.  (1). 
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This  requires  that  the  whitening  filter  be  truly  time  varying  for  finite 
observation  intervals.  In  our  solution,  we  are  replacing  this  time  varying 
response  by  an  estimate  of  the  WF  impulse  response  based  on  assumptions  of 
an  infinite  observation  interval  and  stationarity . This  estimate,  however, 
varies  with  time  resulting  in  a time  varying  filter  impulse  response.  But 
the  time  varying  nature  of  the  adaptive  filter  impulse  response  must  be 
distinguished  from  that  obtained  by  solving  Eq.  (1).  In  this  sense  the 
adaptive  filter  is  only  suboptimum,  but  the  present  implementation  appears 
to  be  the  only  effective  practical  solution.  It  is  worth  noting  that  the 
conventional  adaptive  filters  also  have  a similar  difficulty  in  processing 
finite  observation  data.  In  other  words,  the  time  varying  nature  of  the  con- 
ventional adaptive  filters  is  due  to  the  changing  estimates  of  the  truly 
optimum  weights,  and  does  not  represent  the  true  time  varying  nature  implicit 
in  Eq.  (1)  . 

There  is  also  another  aspect  to  the  time  varying  nature  of  all  adaptive 
filters.  If  the  interference  environment  is  changing,  these  filters  adjust 
their  weight  automatically  to  cancel  out  the  interference.  This  time  varia- 
tion of  the  weights  of  the  filter  due  to  the  changing  environment  is  rela- 
tively slow  compared  to  the  sample-to-sample  time  variations  due  to  random 
fluctuations. 

In  spite  of  the  above  mentioned  shortcomings,  the  new  adaptive  filter 
has  performed  satisfactorily  in  practical  applications,  such  as  in  Ballistic 
Missile  Defense  clutter  suppression. 

5.  Conclusion 


In  this  paper  we  presented  a new  open  loop  approach  to  adaptive  filter- 
ing. By  modeling  the  interference  as  an  autoregressive  process  we  could  use 
a parametric  spectral  estimation  technique  directly  yielding  the  whitening 
filter  parameters.  In  particular,  the  Kalman  filter  was  used  to  effectively 
identify  the  parameters.  The  clutter  suppression  ability  of  the  new  adaptive 
filter  was  demonstrated  with  applications  to  synthetic  data  and  a ballistic 
missile  defense  problem. 

A future  paper  will  present  the  additional  investigation  currently  in 
progress.  These  include  further  theoretical  investigations,  simple  spectral 
estimation  algorithms  using  stochastic  approximation,  and  applications  to 
spread  spectrum  communication,  air-to-air  seekers  and  airborne  MTI. 
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FIGURE  1:  Conceptual  Structure  of  the  Adaptive  Filter 
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FIGURE  2:  Optimum  Receiver  for  Clutter  Environment 
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FIGURE  3:  Adaptive  Filter  Structure  to  Process 
Coherent  Burst  of  N Weighted  Pulses 
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(a) 


Figure  4: : 


(b) 

(a)  S/I  Performance  and  (b)  S/C  Performance  of  the  AF  Relative  to  the 
Optimum  Filter.  (Signal  is  frequency  offset  from  the  clutter  source) 


Figure  5:  AF  Frequency  Response  and  Clutter  Spectrum 
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FIGURE  7:  (a)  Adaptive  Filter,  and  (b)  Matched  Filter  Response 

to  TBU  Data  at  High  Altitude 
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Abstract 


A method  for  estimating  the  doppler  spectrum  of  targets  having  a con- 
tinuous distribution  in  both  range  and  velocity  is  discussed.  This  technique 
utilizes  the  random  signal  radar  to  achieve  excellent  range  resolution  and 
freedom  from  ambiguity,  and  the  autoregressive  spectral  estimator  to  achieve 
excellent  resolution  in  frequency.  It  is  shown  that  the  output  of  the  wide- 
band correlator  of  the  random  signal  radar  has  a spectral  density  that  is  a 
replica  of  the  scatterer  density  function  of  .the  target  in  its  velocity  dimen- 
sion. The  autoregressive  spectral  estimator  provides  a computationally 
efficient  method  of  estimating  this  spectral  density  with  outstanding  resolu- 
tion characteristics. 


Introduction 


The  class  of  targets  considered  here  is  made  up  of  targets  that  are  non- 
rigid  and  continuously  distributed  in  range  and  velocity.  This  implies  that 
a target  can  be  modeled  as  a collection  of  scatterers  that  are  in  relative 
motion  with  respect  to  one  another  and  are  sufficiently  close  in  range-doppler 
space  that  individual  scatterers  cannot  be  resolved.  Examples  of  such  targets 
are  precipitation,  weather  formations,  the  ocean  surface,  foliage,  chaff,  etc. 
It  is  often  of  interest  to  know  the  distribution  of  scatterer  velocities, 
particularly  as  a function  of  range.  In  meteorological  measurements,  this 
information  makes  it  possible  to  distinguish  vortex  formations  from  non-vortex 
ones,  or  to  measure  v/ind  shear  as  a function  of  altitude.  Ocean  surface 
measurements  reveal  wave  heights  and  sea  state  and  foliage  measurements  can 
determine  the  character i s t i cs  of  certain  classes  of  clutter  and  aid  in  Intru- 
sion detection  [l],  [?.] . 

In  most  of  the  applications  listed  above,  it  is  necessary  to  have  excel- 
lent resolution  in  both  range  and  velocity  simultaneously.  This  requires 
that  the  radar  signal  have  a large  time-bandwi dth  product.  The  most  widely 
used  method  of  achieving  large  time-bandwidth  products  for  radar  signals  is 
through  the  use  of  pulse  compression  by  frequency  modulation  or  pseudorandom 
binary  phase  coding  of  the  transmitted  signal.  Such  signals  are  periodic  in 
nature  and  therefore  exhibit  ambiguous  responses  in  both  range  and  doppler. 
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Furthermore,  they  possess  significant  time  and  frequency  sidelobes  that  re- 
sult in  erroneous  respones  for  continuously  distributed  targets. 

Many  of  the  difficulties  inherent  in  periodic  waveforms  can  be  overcome 
by  using  nondetermini  Stic  signals.  The  use  of  random  signals  has  been  thor- 
oughly investigated  both  theoretically  and  experimentally  and  their  utility 
demonstrated  for  many  types  of  radar  detection  [1],  [2],  [3].  It  is  this 
type  of  signal  that  is  considered  in  the  present  discussion. 

Random  Signal  Radar  Output 

Only  those  aspects  of  the  operation  of  the  random  signal  radar  that  are 
essential  to  the  present  application  are  reviewed  here.  The  transmitted 
signal  consists  of  bursts  of  wideband  random  noise.  Samples  of  this  random 
signal  are  taken  at  the  time  of  transmission,  converted  to  binary  form,  and 
delayed  in  a multi-stage  shift  register.  The  received  signal  is  also  sampled 
and  crosscorrelated  with  the  delayed  samples  from  the  transmitted  signal  in  a 
polarity-coincidence  correlator.  After  filtering,  the  correlator  output  is 
proportional  to  the  envelope  of  the  crosscorrelation  function  of  the  trans- 
mitted signal  and  the  received  signal  and  provides  a fully  coherent  detection 
of  the  received  signal.  A block  diagram  of  such  a system  is  shown  in  Figure 
1. 


The  peak  of  the  filte.od  output,  as  a function  of  correlator  delay, 
occurs  at  t = 2R/c  and  corresponds  to  the  roOnd  trip  travel  time  to  the  tar- 
get. The  shape  of  the  correlation  function  is  determined  by  the  spectral 
density  of  the  transmitted  signal  and,  hence,  time  sidelobes  can  be  controlled 
readily  with  a simple  filter  or  by  shaping  the  sampling  pulse.  The  range 
resolution  is  determined  by  the  RF  bandwidth  of  the  transmitted  signal  and  is 
closely  approximated  by 


where  B is  the  bandwidth  In  Mz  and  c is  the  velocity  of  light. 


When  the  target  is  moving,  the  envelope  of  the  correlation  function  modu- 
lated by  a sinusoid  at  the  frequency  of  the  doppler  shift.  Velocity  resolu- 
tion is  given  by 

Av  = 1 X W (2) 

2 c 

where  X is  the  wavelength  of  the  center  frequency  of  the  transmitted  spectrum 
and  W is  the  bandwidth  (in  Hz)  of  the  correlator  output  filter  centered  at 
the  doppler  frequency  being  measured. 


Since  the  transmitted  signal  is  purely  random,  there  Is  no  ambiguity  in 
range  and  the  pulse  repetition  rate  can  always  be  made  high  enough  to  avoid 
any  doppler  ambiguities. 

If  the  transmitted  signal  is  x(t),  the  return  signal  from  the  kth 
scattering  point,  having  a radial  velocity  of  v^_.  Is  a^xIt-T^ft)]  where  a^  is 
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the  relative  magnitude  of  the  return  and 


Tk(t)  = Tk  " V 


(3) 


is  the  delay  rate.  If  there  are  K such  scattering  points,  the  total  signal 
into  the  correlator  is 


K 

y (t)  = E ak  x[(l+ak)t  - + n(t)  (5) 

k=l 

where  n(t)  is  the  system  noise. 

The  correlator  also  delays  and  time  scales  the  reference  signal  so  that 
it  can  be  represented  as 

r(t)  = x[(l+ar)t  - Trl  (6) 

The  received  signal  and  the  reference  signal  are  multiplied  together  in  the 
correlator  and  the  product  is  filtered  to  remove  all  high-frequency  components. 
Thus,  the  correlator  output  is  an  approximation  to  the  expected  value  of 
[y(t)r(t)]  and  may  be  expressed  as 

' K 

E[y(t)r(t)]  = R (t)  = Z a R [(a  -a  )t-(t  -t.)]  (7) 

yr  k_j  x x r k ' K 

where  R (•)  is  the  autocorrelation  function  of  x(t).  Since  the  transmitted 
signal  ¥s  a bandpass  function  around  some  center  frequency,  fc>  its  auto- 
correlation function  will  be  of  the  form  R (t)  cos  io  t.  Thus,  (7)  becomes 

c c 

K 

R (t)  = E a R [ (a  -a.  ) t- (t  “t.  ) ] cos{u)  [(a  “a.  ) t-(x  “t.  ) ] } (R) 

yr  k__.|  K c r k rK  c r * ' K 

Each  term  of  this  expression  is  oscillatory  at  a low  frequency  of  (or_ak)fc 
and  has  an  envelope  that  varies  in  accordance  with  R [ (ar-a.  ) t- (t -xk)  1 • 

Thus,  the  correlator  output  is  the  linear  superposition  of  the  conerent  return 
from  each  of  the  scattering  points. 

If  the  scattering  points  are  separated  in  range  by  an  amount  greater 
than  AR,  as  given  by  (I),  or  are  separated  in  velocity  by  an  amount  greater 
than  Av,  as  given  in  (2),  then  they  can  be  observed  individually  and  the 
range  and  the  velocity  of  each  one  measured.  This  capability  of  the  radar 
may  be  a great  advantage  in  identifying  the  character! sties  of  a rigid  target. 
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However,  in  the  case  of  continuously  distributed  targets,  as  considered  here, 
the  scatterers  are  not  separated  in  either  range  or  velocity  by  an  amount 
sufficient  to  be  resolvable.  Hence,  individual  scattering  points  cannot  be 
identified  and  the  observer  must  be  content  with  measuring  the  distribution 
of  velocities  in  each  range  cell. 

Model  for  Distributed  Targets 

In  situations  being  considered  here,  it  is  convenient  to  define  a scat- 
terer  density  function,  p(a,x),  that  is  a measure  of  how  the  scattering  points 
are  distributed  in  both  range  and  velocity.  This  density  function  is  defined 
such  that  the  quantity  p(ct,T)AaAx  has  the  physical  significance  of  being  the 
average  number  of  scattering  points  having  a velocity  (range  rate)  in  the  in- 
crement from  a to  a + Act  and  lying  in  the  range  increment  corresponding  to 
delays  between  t and  t + At.  Furthermore,  the  integral  over  all  values  of  a 
is  proportional  to  the  usual  radar  cross-section  per  unit  volume. 

It  can  be  shown  [4]  that  the  expected  value  of  the  correlator  output 
becomes 

R _(t)  = //  p(a,x)  a(x)  R t(a  -a)t-(x  -x)]  dadx  (9) 

y r at* 

where  a(x)  is  the  average  attenuation  coefficient  of  the  returns  from  a range 
having  a delay  of  t and  may  include  the  fourth  power  dependence  on  the  range. 

Although  p(a,x)  is  defined  for  distributed  targets,  it  can  be  applied 
also  to  the  discrete-target  case  by  noting  that  in  this  case 

K 

aTxT  p(a,x)  = r ak  6(0-0^)  6(x-xk)  (10) 

k=l 

Substituting  (10)  into  (9)  and  carrying  out  the  integration  immediately  yields 
the  result  given  previously  in  (8). 

When  the  scatterer  density  is  smoothly  distributed  over  a substantial 
range  of  a and  x values,  so  that  p(a,x)  in  (9)  is  changing  slowly  compared  to 
the  signal  autocorrelation  function,  then  (9)  will  yield  a value  that  is  very 
nearly  zero.  This  is  because  R (•)  is  a bandpass  function  and  has  zero  net 
volume  in  a and  x.  The  physicaf  meaning  of  this  is  that  the  correlator  out- 
put, at  any  given  time,  consists  of  the  linear  superposition  of  many  compo- 
nents of  the  form  described  by  (8)  with  random  frequencies  and  starting  times, 
and  that  the  instantaneous  sum  of  these  is  just  as  likely  to  be  negative  as 
positive.  As  a consequence,  the  correlator  output  cannot  be  used  to  Identify 
the  parameters  of  any  one  scattering  point  as  it  could  In  the  resolvable 
discrete  target  case. 

However,  in  spite  of  the  above  difficulty,  the  correlator  output  does 
contain  information  regarding  the  scatterer  density  function,  p(a,x).  In 
particular,  the  time  function  R (t)  is  a sample  functior  from  a random 
process  having  a well-defined  s^>£ctral  density  that  is  related  to  the  correla- 
tor delay  and  the  velocity  distribution  of  the  scatterers  at  that  delay. 
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This  spectral  density  is  given  by  [4] 


vf>  ■ 2 Vo + 2 pcxo  * ^T)  pk  * -7s-  • v 


OD 


where 


X = 
o 


tl  = 
o 


p = 
c 

f = 
o 


T = 

o 

The  total 


spectral  density  of  the  transmitted  signal  in  bandwidth  B 
2 X^B,  transmitted  signal  power 

spectral  density  of  the  receiver  noise,  assumed  white 
total  received  signal  power  from  all  ranges 

a frequency  off-set  (greater  than  the  largest  doppler  frequency) 
introduced  in  one  channel  of  the  receiver  to  avoid  doppler 
fol dover. 

delay  of  the  reference  signal 

received  signal  power  is  obtained  from 


Pc  = f / PT  a2(r)  p(a,x)  dadT 


(12) 


and  can  be  used  to  relate  the  scatterer  density  function  to  the  radar  cross- 
section  per  unit  volume.  If  n(R)  is  the  radar  cross-section  per  unit  volume 
as  a function  of  range,  then  from  the  usual  radar  equation  the  total  received 
signal  power  is 


P 

c 


(13) 


when  it  is  assuned  that  the  distributed  target  completely  fills  a circular 
beam  of  width  cf>b  radians.  Comparison  of  (12)  and  (13)  indicates  that 

2ir  rV  a (2R/c) 

■ n(R)  = 2 / p(a,2R/c)  da  (14) 


Thus,  the  radar  cross-section  per  unit  volume  can  be  determined  from  the 
scatterer  density  function,  which  can  In  turn  be  measured  by  observing  the 
spectral  density  of  the  correlator  output. 
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The  Doppler  Spectrum 


The  spectral  density  corresponding  to  (11)  is  shown  in  Figure  2 for  a 
hypothetical  situation.  The  flat  portion  of  the  spectrum,  labeled  S , is 
due  to  receiver  noise  and  return  signals  from  ranges  other  than  those  in  the 
range  cell  having  a delay  of  t . The  peak  in  the  spectrum  comes  from  returns 
in  the  range  cell  of  interest  and  has  the  shape  of  the  scatterer  density 
function.  The  objective  of  the  spectrum  measurement  is  to  determine  the 
magnitude,  S , and  shape  of  this  peak.  The  accuracy  with  which  this  can  be 
done  is  clearly  a function  of  the  ratio  of  S?  to  S + and  the  frequency  reso- 
lution capability  of  the  spectral  estimation  procedure.  This  aspect  is  dis- 
cussed subsequently. 

Letting  v be  any  value  of  radial  velocity,  then  it  is  possible  to  define 
a velocity  density,  p (v) , where  p (v) Av  has  the  physical  significance  of  rep- 
resenting the  fraction  of  the  total  return  from  the  range  cell  of  interest 
that  come  from  scattering  points  having  a relative  velocity  in  the  interval 
from  v to  v + Av.  This  velocity  density  is  related  to  the  spectral  density 
by 


p(v)  = j [Sp(^  + f ) - S 1 
A R X o c+n 
c 


(15) 


where  A is  the  area  under  the  peak  of  the  spectral  density  and  is  defined  by 


A = / tS(f) 
;o  R 


S 

c+n 


3 df 


(16) 


It  is  clear  from  the  above  discussion  that  an  estimate  of  the  spectral 
density  of  the  correlator  output  over  a sufficient  range  of  frequencies  will 
lead  directly  to  an  estimate  of  the  velocity  density  of  the  distributed  tar- 
get in  the  range  cell  of  interest.  If  the  range  at  which  the  measurement  is 
made  is  changed,  and  if  the  measurements  are  repeated  in  a sufficient  number 
of  time  intervals,  then  a complete  history  of  the  velocity  density  function 
as  a function  of  range  and  time  can  be  obtained. 


Accuracy  Considerations 


Before  describing  a technique  for  estimating  the  spectral  density,  it 
is  appropriate  to  state  some  fundamental  limits  on  the  accuracy  with  whijh 
this  spectrum  can  be  estimated.  If  the  estimate  of  S (f)  is  denoted  by  S (f), 
a convenient  measure  of  the  goodness  of  the  estimate  rs  its  s i gnal -to-noi se 
ratio  defined  by 


where  E[S  ( f ) ] 
and  Var[SR(f)j 


(S/M)o 


E2[SR(f)] 

Var[SR(f)] 


(17) 


is  the  mean  value  of  the  estimate  of  any  specified  frequency 
is  the  variance  of  the  estimate. 
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It  has  been  shown  [6]  that  this  s ignal-to-noise  ratio  can  be  expressed 
in  the  form 

r 2 


(S/fl)  o = 2[1  + 2WT]  [~] 


(18) 


where  V/  is  the  desired  frequency  resolution,  T is  the  time  interval  over  wh i ch 
the  correlator  output  is  observed,  and 


I “ S ^2B^ 

c+n 

where  f is  the  rate  at  which  samples  are  taken  from  each  transmitted  pulse 
and  B it  the  transmitted  signal  bandwidth. 


Two  things  are  immediately  apparent;  the  quantity  I must  be  larger  than 
one  if  the  last  factor  in  (18)  is  not  to  have  a significant  effect,  and  if 
I » 1 then  (S/N)  is  determined  almost  entirely  by  the  desired  frequency 
resolution  W and  ?he  observation  time  T.  It  is  important  to  note  that  better 
frequency  resolution  (that  is,  smaller  values  of  W)  leads  to  poorer  signal- 
to-noise  ratios. 

As  a means  of  indicating  the  magnitudes  of  some  of  these  quantities, 
consider  the  numerical  values  in  Table  I. 

Table  I.  Numerical  Values  Used  For  Illustration 


PT  = 1 V/ 

<j>b  = 0.018  radians  (1°) 

R = 75  MHz 

R = 1000  m 
o 

T = 0. 1 seconds 


f = 9 GHz 

c -20 
N = 1.6M  x 10  (T  ,,  = 1200°  1C) 
o _7  o ■>  eff 

q = 10  m*Ym 
W = 120  Hz 
f =15  MHz 


The  radar  cross-section  per  unit  volume,  q,  appearing  in  the  above  table 
corresponds  to  rainfall  at  a rate  of  1.25  mm/hr.  For  these  parameters  the 
quantity  I turns  out  to  be  1 = 3.27  and  the  resulting  s i gnal -to-noi se  ratio 
is 


(S/N)  = 2 (1  + 2 x 120  x 0.1]  [-^Ll 2 

o 3. z /+i 

- 29.3  or  1 h.7  dB 

» nt.'i*  r*  ! value  is  not  large,  it  is  adequate  for  many  measurements. 

'ppler  Spectrum  estimation 

•...  ...  ••  < ,«y • in  which  the  spectral  density  of  the 

• » ■ T*  e *t  ot>vious  mcthotl  is  the  use  of 


filters  and  a square-law  envelope  detector  as  shown  in  Figure  3.  This  will 
estimate  the  spectral  density  at  one  frequency.  In  order  to  obtain  a com- 
plete spectrum,  the  bandpass  filter  must  either  be  tunable  or  a bank  of  band- 
pass filters  must  be  used. 

A more  effective  technique  for  this  application,  however,  appears  to  be 
the  use  of  the  autoregressive  spectral  estimator.  This  technique  has  been 
widely  discussed  in  the  literature  [7] . [8]  and  an  empirical  comparison  with 
other  standard  technique  has  been  made  [9].  The  basic  philosophy  of  this 
approach  is  to  fit  a finite-order  autoregressive  process  (AR)  to  the  observed 
spectrum  in  a minimum  mean-squared  error  sense.  This  leads  to  a computa- 
tionally efficient  spectral  estimate  that  is  capable  of  excellent  resolution. 

If  {?.}  are  a sequence  of  equally  spaced  samples  from  a random  process, 
then  an  Ltli  order  AR  process  is  defined  by  the  difference  equation 

L 

zk  - a,  Vi  * "k  (20) 

where  the  n^  are  samples  of  a stationary  Gaussian  white  noise  process  having 
zero  mean  and  a variance  of  a 

n 

In  order  to  estimate  the  spectral  density  of  the  process  from  which  the 
samples  are  taken,  the  following  quantities  need  to  be  estimated: 

1)  Samples  of  the  estimated  autocorrelation  are  obtained  from 

r(,)-^  1 (2” 

after  subtracting  out  the  sample  mean.  When  N is  large,  this  com- 
putation can  be  done  more  efficiently  through  the  use  of  the  Fast 
Fourier  Transform. 

2)  The  AR  coefficients,  {a,},  can  be  estimated  from  the  { r ( i ) } by 
solving  the  matrix  equation 

R aT  = rT  (22) 

where 


r “ [r(  1 ) r (L) ] 

2 m t3, aL] 

Ho)  ?<l) 

jj  - ?(1) 

r(L-l) 
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3) 


This  equation  can  be  solved  efficiently 

Estimate  the  white  noise  variance,  a ^ 

n 

o L 

a “ = r(o)  - E a.  ?(i) 
n i 


using  the  Levinson  algorithm, 
from 

(23) 


The  Lth-order  AR  estimate  of  the  spectral  density  is  then  obtained  from 


s„(f>  - 


- 2 -r 

a AT 
n 


1 - E a.  e 
l-l  ' 


-j2irf  i AT 


• lfl  i Sf 


(2*0 


where  AT  is  the  sampling  period. 

Since  the  AR  spectral  estimator  is  nonlinear,  analytical  derivation  of 
its  statistical  properties  is  very  difficult.  However,  some  asymptotic 
properties  have  been  derived  in  the  literature  [11],  [12].  These  results 
indicate  that  the  AR  estimator  has  asymptotic  statistical  properties  that  are 
similar  to  the  Fourier  transform  methods  using  a rectangular  window. 

An  empirical  investigation  of  the  frequency  resolution  and  accuracy 
properties  of  the  AR  spectral  estimator  has  also  been  done  [9].  This  study 
indicates  that  a specified  resolution  for  an  AR  estimator  can  be  obtained 
with  a much  snaller  number  of  lag  values  (i.e.,  L)  than  is  possible  with 
Fourier  transform  methods.  Furthermore,  the  frequency-averaged  variance  of 
the  AR  spectral  estimate  is  essentially  the  same  as  for  Fourier  transform 
methods  using  a rectangular  window  for  a given  ratio  of  L/N.  However,  since 
L can  be  smaller  for  the  AR  estimator  for  a given  frequency  resolution,  the 
net  result  is  that  the  frequency-averaged  variance  of  the  AR  technique  will 
be  smaller  than  that  of  the  Fourier  transform  technique  when  they  are  compared 
on  the  basis  of  equivalent  resolving  power. 

Experimental  Results 

The  techniques  described  above  have  been  used  with  the  random  signal 
radar  to  estimate  the  doppler  spectrum  of  rainfall.  The  characteristics  of 
the  radar  are  displayed  in  Table  II. 
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Table  II.  Random  Signal  Radar  Used  for  Estimating 
Doppler  Spectrum  of  Rainfall 


Frequency,  f 

8.9h  GHz 

Average  transmitted  power/pulse,  P^ 

26  mV/ 

Pulse  repetition  rate 

1 MHz 

Sampling  rate,  fg 

1 MHz 

Range  resolution,  AR 

4.5  m 

Antenna 

30"  paraboloid 

Range,  R^ 

100  m 

Observation  time,  T 

0.36  seconds 

Figure  4 shows  the  estimated  doppler  spectrum  computed  from  fl  = 3000 
samples  (AT  = .12  ms)  for  several  different  values  of  L.  This  figure  shows 
that  the  spectral  peak  is  fully  resolved  for  L = 250  [13]* 

Conclus ions 

The  main  point  to  be  emphasized  in  this  discussion  is  that  the  excellent 
range  resolution  characteristic  of  the  random  signal  radar  can  be  combined 
with  the  excellent  frequency  resolution  characteristic  of  the  autoregressive 
spectra'  estimator  to  obtain  a system  that  is  capable  of  measuring  the  com- 
plete doppler  spectrum  of  small  regions  in  a continuously  distributed  radar 
target.  This  combination  of  characteristics  is  very  difficult  to  achieve 
with  conventional  pulse  radar  without  encountering  serious  ambiguity  problems. 
Thus,  the  approach  described  here  provides  a useful  technique  for  a wide  range 
of  problems  involving  distributed  targets,  discrete  targets  embedded  in  dis- 
tributed targets, mov i ng  clutter  measurement  or  any  situation  in  which  there 
is  relative  motion  of  scatterers  that  cannot  be  resolved  in  either  range  or 
doppler. 
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Vv 


Centered  at  fj 


figure  3.  Filter  method  of  estimating  spectral  density. 


Figure  *».  Doppler  spectrum  for  rainfall  with  the  antenna  pointing  90° 
above  the  horizon  [13]- 
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INSTANTANEOUS  FREQUENCY  ESTIMATION 
FROM  SAMPLED  DATA 


WILLIAM  R.  CARMICHAEL 
DR.  RICHARD  G.  WILEY 


Abstract 


Estimating  the  instantaneous  frequency  of  a frequency  modulated  signal  in 
the  presence  of  noise  is  simplified  when  the  input  signal-to-noise  ratio  (SNR) 
is  greater  than  about  12  dB.  The  techniques  described  here  use  the  zero  cross- 
ings of  the  signal  as  a basis  for  estimating  the  instantaneous  frequency  of 
data  provided  by  the  Rome  Air  Development  Center  (RADC) . The  estimated  rms 
error  is  0.7%. 


Introduction 


Prelimi nar / Analysis 

For  wiiite  noise  and  an  input  SNR  of  12  dB  or  more,  it  is  well  known  that 
the  SNR  out  of  an  FM  demodulator  is  given  by 


SNR  = 3 SNR.  fly- 
out mlF 


(1) 


where: 


AF  = peak  frequency  deviation 
F^  = highest  modulating  frequency 

For  the  signai  data  supplied  by  RADC,  the  input  SNR  is  given  as  about 
20  dB,  and  the  input  signal  to  noise  plus  interference  ratio  is  about  17  dB. 
Equation  1 may  be  applied  in  this  case  since  the  interference  is  small. 

RADC  also  specified  that  the  spectrum  of  interest  was  from  130  to  270  Hz 
and  that  the  rate  of  frequency  drift  was  no  greater  than  190  Hz/s. 

For  a single  sinusoidal  modulation,  the  maximum  rate  of  change  of  frequency 

is  given  by 

Maximum  Slope  = AF  • 2irFm  (2) 

The  largest  modulating  frequency  is,  therefore, 

„ , „ (Maximum  Slope) 

m 2tt  AF 


(3) 


For  a given  slope,  the  maximum  modulating  frequency  is  largest  when  the 
peak  deviation  is  the  smallest.  Examining  the  spectrum  of  the  input  (see 
Figure  1A)  shows  a total  frequency  deviation  of  at  least  60  Hz.  Then  the 
minimum  peak  deviation  is  30  Hz  and  the  maximum  modulating  frequency  is 

190  ~ 

Maximum  F = ~z~ rtr  = 1 Hz  (4) 

m 2tt  • 30 

In  this  situation,  the  output  SNR  should  be  approximately 

SNR  = 17  dB  + 10  logf 3(30)21  = 51  dB  (5) 

out  t J 

Therefore,  an  ordinary  FM  demodulator  will  provide  good  performance.  Note, 
however,  that  the  output  bandwidth  of  such  a system  would  be  on  the  order  of 
1 Hz.  Because  the  data  sample  available  is  only  0.64  s in  duration,  the  sub- 
stantial transient  effects  due  to  use  of  a 1 Hz  filter  would  severely  affect 
the  estimates  of  the  instantaneous  frequency  within  the  0.64  s record. 

Data  Prefiltering 

Figure  1A  presents  a plot  of  the  spectrum  of  the  raw  input  data.  Since 
the  data  is  narrowband,  bandpass  filtering  was  applied  to  improve  the  SNR.  The 
filter  of  Figure  IB  is  a 63  point  FIR  filter  designed  with  McClellan's  Remez 
Exchange  Algorithm  [4],  The  filter  passband  extends  from  130  Hz  to  210  Hz  for 
a sampling  rate  of  800  Hz.  Note,  the  upper  band  edge  of  the  data  was  estimated 
to  be  210  Hz;  the  value  given  in  the  problem  statement  was  270  Hz.  An  FIR 
filter  was  chosen  because  of  its  ideal  phase  characteristic.  The  peak  in  band 
amplitude  ripple  is  -30  dB.  In  Figure  1C,  the  spectrum  of  the  data  after  band- 
pass prefiltering  is  presented. 

Zero  Crossings  and  Frequency  Estimation 

The  average  frequency  of  the  signal  (in  the  absence  of  noise)  is  given  by 
the  reciprocal  of  twice  the  time  between  two  zero  crossings.  In  this  case, 
there  will  be  many  zero  crossings  per  second  compared  to  the  bandwidth  of  the 
modulation.  Assume  that  the  modulating  signal  is  bandlimited.  Then,  in  view 
of  the  mean  value  theorem  for  continuous  differentiable  functions,  the  instan- 
taneous frequency  will  be  equal  to  the  average  frequency  at  some  point  between 
the  zero  crossings  used  to  estimate  the  average  frequency.  In  Appendix  A,  it 
is  demonstrated  that  the  instantaneous  frequency  at  the  midpoint  of  the  interval 
is  equal  to  the  average  frequency  if  the  modulating  signal  is  linear  over  that 
time.  Since  there  are  many  zero  crossings  per  cycle  of  the  highest  modulating 
frequency  present,  the  error  caused  by  assuming  that  the  modulating  signal  is 
approximated  by  a straight  line  over  a few  zero  crossing  intervals  will  be  small. 
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Figure  1 


Linear  Approximation  Error 


Assume  that  the  modulating  signal  is  of  the  form  sin  u^t.  Here  we  deter- 
mine the  error  due  to  approximating  the  instantaneous  value  at  the  center  of  the 
time  interval  by  the  average  value  over  the  interval.  If  the  interval  is 
(tj_,  1 9 ) » the  actual  value  at  the  center  of  the  interval  is  given  by 
"t  + t 

sin  a)  ( 1 ).  The  average  value  is 

m 2. 


_a_ - r 

t2'  h J 


sin  U)  t dt  = 
m 


t2  + t1  C2 

2 sln  a)m( 2 } Sin  C°m( 2 

wm(t2  " V 


(6) 


The  ratio  of  the  estimated  value  to  the  actual  value  of  the  function  is 


Estimated  Value 
Actual  Value 


C2  ” C1 

2 sin  to  ( x ) 

m 2 

Wm(t2  ' tl) 


(7) 


Figure  2 is  a plot  of  the  absolute  value  of  the  error  in  dB  versus  the 
averaging  interval  expressed  in  degrees. 

Error  Due  to  Noise 


In  the  high  SNR  cases,  the  p :rturbation  of  the  zero  crossings  of  a sinusoid 
al  signal  by  noise  will  be  minor  and  can  be  calculated  from  the  slope  of  the 
sinusoid  as  it  passes  through  zero  multiplied  by  the  amplitude  of  the  noise  at 
that  instant. 


Let  the  signal  be  represented  by 

s(  = A sin  0(t)  • 


Differentiating  gives 


ds-(L> 

dt 


A cos 


ft 

e dt 


(8) 


(9) 


Then  the  error  in  the  zero  crossing  time  due  to  an  additive  signal  of  magni 
tude  N will  be  (since  at  that  time  cos  9=1) 


AT  - 


N 


(10) 


AVERAGING  INTERVAL  com  ( t2  — t,  > { degrees  ) 


Figure  2.  Linear  Approximation  Error 
Since  d0/dt  is  the  instantaneous  frequency  multiplied  by  2ir, 


AT 


N 

2tt  A 


(ID 


where: 

AT  = error  in  zero  crossing  time  due  to  noise  N, 

N = instantaneous  noise  amplitude  at  zero  crossing  time, 

A = signal  peak  amplitude,  and 
F^  = instantaneous  frequency. 

Because  the  noise  can  be  considered  a normally  distributed  random  variable 
independent  from  one  zero  crossing  to  the  next,  the  rms  value  of  the  error  in 
the  zero  crossing  time  is  s 


iply 


r 


v ^ 


2tt  A F, 


(12) 


LZ  can  be 


The  signal  power  is  given  by  A /2. 
expressed  in  terms  of  the  SNR  as 


This  means  that  Equate 


1 

2tt  F V2  snr 


(13) 


Finally,  since  any  time  interval  would  have  a similar  independent  error  at 
each  end  of  the  interval,  the  rms  error  in  any  such  interval  will  be  times 
the  value  above.  Denote  this  by  0,^  and  then 


a 


T 


(14) 


Note  that  the  rms  error  in  measuring  any  time  interval  between  zero  cross- 
ings is  the  same  regardless  of  the  number  of  zero  crossings  included  between 
the  ends  of  the  interval.  Thus,  if  the  interval  between  two  zero  crossings 
separated  by  K-l  other  zero  crossings  is  considered,  the  rms  error  in  that  in- 
terval is  also  given  by  Equation  14.  In  that  time,  the  phase  will  have  advanced 
by  Kit,  thus  the  reciprocal  of  such  an  interval  will  measure  the  instantaneous 
frequency  divided  by  K.  If  we  measure  zero  crossings  at  t^  and  t2  and  count  t 2 
as  occurring  K zero  crossings  after  tj.,  the  average  frequency  of  the  signal 
over  the  interval  from  t^  to  t^  will  be  given  by 


(15) 


We  might  then  designate  the  value  in  Equation  15  as  the  instantaneous  fre- 
quency at  the  midpoint  of  the  interval  (subject  to  the  linear  approximation 
error  already  discussed) . The  error  in  this  frequency  estimate  due  to  an  error 
in  estimating  the  interval  (t2  - t^)  is 


AF  = 


- K AT 


2 (t2  - txr 

Using  Equation  14,  the  rms  frequency  error  is  expressed  as 

K 


°F  = 


K 


2 ( 1 2 - 4tt  F.yiir (t2  - 11)- 


Note  that,  approximately,  F^tj  - tj)  - K/2,  so  that 


(16) 


(17) 


(18) 


Hence,  an  improvement  in  the  SNR  at  the  output  proportional  to  the  square 
of  the  number  of  zero  crossings  separating  t£  from  t-^  is  expected,  i.e. 

SNR  = IT2  K2  (SNR)  . (19) 

out  in 

Since  large  K implies  large  time  intervals,  as  K increases,  more  error  due 
to  approximating  the  instantaneous  frequency  at  the  midpoint  of  the  interval  by 
the  average  over  the  interval  is  introduced. 

A typical  instantanecus  frequency  in  the  data  record  is  170  Hz.  If 

F = 1 Hz,  for  a)  (t.  - t, ) = 20°,  (t„  - t ) will  be  approximately  0.056  s.  In 
m m L 1 / 1 

this  case,  K = 2xl70x(0. 056)  = 19.  The  output  SNR  will  then  be  52.5  dB  accord- 
ing to  Equation  19,  and  the  output  signal  to  approximation  error  will  be  45.9  dB 
according  to  Figure  2.  The  combined  SNR  from  both  errors  would  be  about  45.0  dB 
This  approach  to  estimating  the  instantaneous  frequency  would  yield  an  accuracy 
of  about 


Furthermore,  the  approach  is  quite  simple: 


Step  1 locate  the  zero  crossings. 

Step  2 compute  the  reciprocal  of  twice  the  interval 
between  those  zero  crossings  separated  by 
about  0.056  s. 

Step  3 multiply  those  estimates  by  the  number  of 
zero  crossings  separating  the  ends  of  each 
such  interval. 

Step  4 assign  this  value  as  the  instantaneous 

frequency  at  the  center  of  the  interval. 


Final  Interpolation 


The  results  of  the  steps  above  are  instantaneous  frequency  estimates  for 
various  times  which  are  neither  equally  spaced  nor  located  at  the  times  for 
which  estimates  are  desired.  The  estimates  are  available  at  intervals  of  about 
0.028  s,  and  the  desired  outputs  are  located  at  multiples  of  0.04  s.  If  the 
modulation  is  at  a 1 Hz  rate,  we  obtain  a sample  about  every  10°.  With  samples 
spaced  10°  apart,  the  sine  function  may  be  interpolated  linearly  to  0.4%.  Since 
the  error  caused  by  this  is  slightly  less  than  the  error  involved  in  obtaining 
the  instantaneous  frequency  estimates  at  the  interval  midpoints,  this  is  a 
satisfactory  technique. 
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Zero  Location- Procedure 


The  original  data  is  sampled  at  800  Hz.  Since  the  frequency  approaches 
200  Hz,  we  have  samples  at  90°  intervals  in  the  worst  case.  It  would  be  con- 
venient to  locate  the  zero  crossings  by  linear  interpolation  whenever  the  data 
changes  sign.  However,  this  would  introduce  errors  of  up  to  5%  of  the  sampling 
interval  of  0.00125  s,  as  shown  by  Figure  3.  This  would  cause  an  error  in  the 
instantanous  frequency  estimates  of 


AF 

F. 


k at 


AT 


2(t2  - txr 


(t2  - tp 


AF  AT  (0. 00125) (0. 05)  _ 

F.  ~ T,  - T . “ 0.056  ° 

l 2 1 

While  this  is  sufficiently  small,  we  chose  to  increase  the  sampling  rate 
by  a factor  of  five  using  a finite  impulse  response  low  pass  filtdr  [5].  This 
allowed  for  samples  at  4000  Hz.  In  the  worst  case  of  200  Hz  for  the  signal, 
there  are  then  20  samples  per  cycle  (18°)  and  the  worst  case  error  was  reduced 
to  0.15%  of  the  sampling  interval  or  AF/F.  * 7.7  x 10--*  due  to  the  use  of  linear 
interpolation  to  initially  locate  the  zeros.  . This  step  is  clearly  not  necessary 

Unequal  Sample  Spacing 

A technique  [1]  for  reconstructing  bandlimited  signals  from  unequally 
spaced  samples  using  an  iteration  could  be  applied  at  the  final  stage  to  obtain 
greater  accuracy.  Due  to  the  close  spacing  of  the  samples,  this  seemed  unneces- 
sary. Similarly,  the  FM  demodulation  technique  based  on  the  same  iteration  [2] 
could  be  applied.  This  also  seemed  unnecessary  since  the  modulation  rate  is  so 
low  compared  to  the  carrier.  Under  more  difficult  circumstances,  this  technique 
would  be  very  useful. 


Final  Results 


The  estimated  instantaneous  frequency  using  the  reciprocal  of  twice  each 
zero  crossing  interval  is  shown  in  Figure  4A,  the  noise  reduction  effect  of 
using  zero  crossings  spaced  by  about  0.056  s is  shown  in  Figure  4B.  The  esti- 
mates at  the  16  times  requested  by  RADC  are  given  in  Table  1.  These  values  are 
estimated  to  have  an  rms  error  of  0.7%  from  the  error  in  the  original  estimates 
and  the  error  due  to  the  final  interpolation. 
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SAMPLING  POINT  LOCATION  (degreos) 


Figure  3.  Error  due  to  Linear  Interpolation  for  Zero  Location 


Table  1.  Instantaneous  Frequency  Estimates  at  0.04  s 
Intervals 


Samples 

Time(s) 

Instantaneous  Frequency 

C1 

0.04 

140.9301 

C2 

0.08 

146.0931 

C3 

0.12 

152.9369 

C4 

0.16 

160.6346 

C5 

0.20 

169.9827 

C6 

0.24 

179.5855 

C7 

0.28 

187.5818 

C8 

0.32 

194.2402 

fc9 

0.36 

197.8844 

C10 

0.40 

199.8585 

C11 

0.44 

197.8066 

fc12 

0.48 

193.4298 

C13 

0.52 

186.9661 

C14 

0.56 

179.3949 

C15 

0.60 

169.5555 

C16 

0.64 

159.7161 

Appendix  A 


Linear  FM  Demodulation  Using  Zero  Crossings 
Analysis  of  Zero  Crossing  Intervals 

Assume  that  the  FM  signal  is  linear  at  least  through  three  zero  crossings, 
so  that  this  segment  of  the  FM  wave  is  given  by 

S(t)  = sin(o>ct  + at^)  (A-l) 

In  this  case, 

2 

<f>(t)  “ a^t  + at  ■ total  phase  (A-2) 

2 

0(t)  - at  “ phase  modulation  (A-3) 


The  instantaneous  frequency 


or 


f 


(ojc  + 2a  l) 


(A-4) 


The  slope  of  the  modulating  signal  is  2a,  and  u>c  is  the  carrier  frequency. 
Clearly,  the  zero  crossings  of  S(t)  occur  when  the  phase  is  a multiple  of  it: 


r<t>(t)  = mr  = uct  + at 


2 


(A- 5) 


where 

n = 0,  +1,  +2 (A-6) 

The  values  of  t at  which  these  zero  crossings  occur  can  be  obtained  by  solving 
the  quadratic  in  Equation  A-5,  so  that 


‘ is  i - « • '*->> 

V (0 

9 

The  zero  crossing  interval  can  be  found  by  taking  the  difference  between  two 
adjacent  zero  crossing  times: 


At 


n+1 


t - t 
n+1  n 


Atn+1  ■ £ i±\A + 4aJsal 

v 0)  V w 


(A-8) 

(A-9) 
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Computation  of  Instantaneous  Frequency 

Babbit  and  Leon  [3]  show  that  the  average  frequency  over  the  zero  cross 
interval  is  equal  to  the  reciprocal  of  twice  the  interval,  therefore: 


n+1  2At 


n+1 


(A-10) 


c v (Jj  V Cl) 

c c 

Multiplication  of  Equation  A-ll  by 

±\/r7^s±u±^  i+ ‘ht 


i = 


±{i  + +^f- 


0) 


simplifies  the  expression  for  frequency: 


n+1 


4TOn 


(A-ll) 


(A-12) 


(A-13) 


By  virtue  of  the  mean  value  theorem  for  continuous  differentiable  functions, 
it  is  clear  that  the  average  value  over  the  interval  must  be  the  actual  value  at 
some  instant  within  the  interval.  Since  the  modulating  function  in  question  is 
linear,  it  is  expected  that  the  average  given  by  Equation  A-13  would  be  the 
instantaneous  value  at  the  midpoint  of  the  interval.  To  check  this,  the  mid- 
point instantaneous  frequency  can  be  computed. 


Midpoint  = 


t , . + t 
n+1  n 


(A-14) 


Midpoint  ■ ^ [/l  + 4™fe±U+yi+iiS_  2J  (4_ 

CO  0) 

c c 

Substituting  this  value  of  time  for  the  midpoint  given  by  Equation  A-15  into 
the  expression  for  instantaneous  frequency  given  by  Equation  A-14  yields  the 
following  result: 
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midpoint 


1 

2tt 


+ ^ ( Jx  + + 1/1  + aato~» 

c z oT  W 


^ ,J77ssr  +ji +isa 

4ir  1 0)  2 V u>  2 


(A-16) 


(A-17) 


Since  Equations  A-17  and  A-13  are  the  same,  the  fact  that  the  average  fre- 
quency over  the  interval  is  equal  to  the  instantaneous  frequency  at  the  midpoint 
of  the  interval  for  a linear  FM  signal  has  been  proven. 
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